Upload a large file without oom with Grizzly

classic Classic list List threaded Threaded
78 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Upload a large file without oom with Grizzly

Sébastien Lorber
Hello,

I would like to know if it's possible to upload a file with AHC / Grizzly in streaming, I mean without loading the whole file bytes in memory.

The default behavior seems to allocate a byte[] which contans the whole file, so it means that my server can be OOM if too many users upload a large file in the same time.


I've tryied with a Heap and ByteBuffer memory managers, with reallocate=true/false but no more success.

It seems the whole file content is appended wto the BufferOutputStream, and then the underlying buffer is written.

At least this seems to be the case with AHC integration:


So, is there a way to patch AHC to stream the file so that I could eventually consume only 20mo of heap while uploading a 500mo file?
Or is this simply impossible with Grizzly?
I didn't notice anything related to that in the documentation.

Btw in my case it is a file upload. I receive a file with CXF and have to transmit it to a storage server (like S3). CXF doesn't consume memory bevause it is streaming the large fle uploads to the file system, and then provides an input stream on that file.

Thanks


Reply | Threaded
Open this post in threaded view
|

Re: Upload a large file without oom with Grizzly

Ryan Lubke-2


Sébastien Lorber wrote:

> Hello,
>
> I would like to know if it's possible to upload a file with AHC /
> Grizzly in streaming, I mean without loading the whole file bytes in
> memory.
>
> The default behavior seems to allocate a byte[] which contans the
> whole file, so it means that my server can be OOM if too many users
> upload a large file in the same time.
>
>
> I've tryied with a Heap and ByteBuffer memory managers, with
> reallocate=true/false but no more success.
>
> It seems the whole file content is appended wto the
> BufferOutputStream, and then the underlying buffer is written.
>
> At least this seems to be the case with AHC integration:
> https://github.com/AsyncHttpClient/async-http-client/blob/6faf1f316e5546110b0779a5a42fd9d03ba6bc15/providers/grizzly/src/main/java/org/asynchttpclient/providers/grizzly/bodyhandler/PartsBodyHandler.java
>
>
> So, is there a way to patch AHC to stream the file so that I could
> eventually consume only 20mo of heap while uploading a 500mo file?
> Or is this simply impossible with Grizzly?
> I didn't notice anything related to that in the documentation.
It's possible with the FeedableBodyGenerator.  But if you're tied to
using Multipart uploads,  you'd have to convert the multipart data to
Buffers manually and send using the FeedableBodyGenerator.
I'll take a closer look to see if this area can be improved.
>
> Btw in my case it is a file upload. I receive a file with CXF and have
> to transmit it to a storage server (like S3). CXF doesn't consume
> memory bevause it is streaming the large fle uploads to the file
> system, and then provides an input stream on that file.
>
> Thanks
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Upload a large file without oom with Grizzly

Sébastien Lorber
Ok thanks!

I think I see what I could do, probably something like that:


    FeedableBodyGenerator bodyGenerator = new FeedableBodyGenerator();
    MultipartBodyGeneratorFeeder bodyGeneratorFeeder = new MultipartBodyGeneratorFeeder(bodyGenerator);
    Request uploadRequest1 = new RequestBuilder("POST")
            .setUrl("url")
            .setBody(bodyGenerator)
            .build();

    ListenableFuture<Response> asyncRes = asyncHttpClient
        .prepareRequest(uploadRequest1)
        .execute(new AsyncCompletionHandlerBase());


    bodyGeneratorFeeder.append("param1","value1");
    bodyGeneratorFeeder.append("param2","value2");
    bodyGeneratorFeeder.append("fileToUpload",fileInputStream);
    bodyGeneratorFeeder.end();

    Response uploadResponse = asyncRes.get();


Does it seem ok to you?

I guess it could be interesting to provide that MultipartBodyGeneratorFeeder class to AHC or Grizzly since some other people may want to achieve the same thing





2013/8/26 Ryan Lubke <[hidden email]>


Sébastien Lorber wrote:
Hello,

I would like to know if it's possible to upload a file with AHC / Grizzly in streaming, I mean without loading the whole file bytes in memory.

The default behavior seems to allocate a byte[] which contans the whole file, so it means that my server can be OOM if too many users upload a large file in the same time.


I've tryied with a Heap and ByteBuffer memory managers, with reallocate=true/false but no more success.

It seems the whole file content is appended wto the BufferOutputStream, and then the underlying buffer is written.

At least this seems to be the case with AHC integration:
https://github.com/AsyncHttpClient/async-http-client/blob/6faf1f316e5546110b0779a5a42fd9d03ba6bc15/providers/grizzly/src/main/java/org/asynchttpclient/providers/grizzly/bodyhandler/PartsBodyHandler.java


So, is there a way to patch AHC to stream the file so that I could eventually consume only 20mo of heap while uploading a 500mo file?
Or is this simply impossible with Grizzly?
I didn't notice anything related to that in the documentation.
It's possible with the FeedableBodyGenerator.  But if you're tied to using Multipart uploads,  you'd have to convert the multipart data to Buffers manually and send using the FeedableBodyGenerator.
I'll take a closer look to see if this area can be improved.


Btw in my case it is a file upload. I receive a file with CXF and have to transmit it to a storage server (like S3). CXF doesn't consume memory bevause it is streaming the large fle uploads to the file system, and then provides an input stream on that file.

Thanks



Reply | Threaded
Open this post in threaded view
|

Re: Upload a large file without oom with Grizzly

Ryan Lubke-2
Yes, something like that.  I was going to tackle adding something like this today.  I'll follow up with something you can test out.

Sébastien Lorber wrote:
Ok thanks!

I think I see what I could do, probably something like that:


    FeedableBodyGenerator bodyGenerator = new FeedableBodyGenerator();
    MultipartBodyGeneratorFeeder bodyGeneratorFeeder = new MultipartBodyGeneratorFeeder(bodyGenerator);
    Request uploadRequest1 = new RequestBuilder("POST")
            .setUrl("url")
            .setBody(bodyGenerator)
            .build();

    ListenableFuture<Response> asyncRes = asyncHttpClient
        .prepareRequest(uploadRequest1)
        .execute(new AsyncCompletionHandlerBase());


    bodyGeneratorFeeder.append("param1","value1");
    bodyGeneratorFeeder.append("param2","value2");
    bodyGeneratorFeeder.append("fileToUpload",fileInputStream);
    bodyGeneratorFeeder.end();

    Response uploadResponse = asyncRes.get();


Does it seem ok to you?

I guess it could be interesting to provide that MultipartBodyGeneratorFeeder class to AHC or Grizzly since some other people may want to achieve the same thing





2013/8/26 Ryan Lubke <[hidden email]>


Sébastien Lorber wrote:
Hello,

I would like to know if it's possible to upload a file with AHC / Grizzly in streaming, I mean without loading the whole file bytes in memory.

The default behavior seems to allocate a byte[] which contans the whole file, so it means that my server can be OOM if too many users upload a large file in the same time.


I've tryied with a Heap and ByteBuffer memory managers, with reallocate=true/false but no more success.

It seems the whole file content is appended wto the BufferOutputStream, and then the underlying buffer is written.

At least this seems to be the case with AHC integration:
https://github.com/AsyncHttpClient/async-http-client/blob/6faf1f316e5546110b0779a5a42fd9d03ba6bc15/providers/grizzly/src/main/java/org/asynchttpclient/providers/grizzly/bodyhandler/PartsBodyHandler.java


So, is there a way to patch AHC to stream the file so that I could eventually consume only 20mo of heap while uploading a 500mo file?
Or is this simply impossible with Grizzly?
I didn't notice anything related to that in the documentation.
It's possible with the FeedableBodyGenerator.  But if you're tied to using Multipart uploads,  you'd have to convert the multipart data to Buffers manually and send using the FeedableBodyGenerator.
I'll take a closer look to see if this area can be improved.


Btw in my case it is a file upload. I receive a file with CXF and have to transmit it to a storage server (like S3). CXF doesn't consume memory bevause it is streaming the large fle uploads to the file system, and then provides an input stream on that file.

Thanks



Reply | Threaded
Open this post in threaded view
|

Re: Upload a large file without oom with Grizzly

Sébastien Lorber
Ryan thanks, it works fine, I'll make a pull request on AHC tomorrow with a better code using the same Part classes that already exist.

I created an OutputStream that redirects to the BodyGenerator feeder.

The problem I currently have is that the feeder feeds the queue faster than the async thread polling it :)
I need to expose a limit to that queue size or something, will work on that, it will be better than a thread sleep to slow down the filepart reading


2013/8/27 Ryan Lubke <[hidden email]>
Yes, something like that.  I was going to tackle adding something like this today.  I'll follow up with something you can test out.


Sébastien Lorber wrote:
Ok thanks!

I think I see what I could do, probably something like that:


    FeedableBodyGenerator bodyGenerator = new FeedableBodyGenerator();
    MultipartBodyGeneratorFeeder bodyGeneratorFeeder = new MultipartBodyGeneratorFeeder(bodyGenerator);
    Request uploadRequest1 = new RequestBuilder("POST")
            .setUrl("url")
            .setBody(bodyGenerator)
            .build();

    ListenableFuture<Response> asyncRes = asyncHttpClient
        .prepareRequest(uploadRequest1)
        .execute(new AsyncCompletionHandlerBase());


    bodyGeneratorFeeder.append("param1","value1");
    bodyGeneratorFeeder.append("param2","value2");
    bodyGeneratorFeeder.append("fileToUpload",fileInputStream);
    bodyGeneratorFeeder.end();

    Response uploadResponse = asyncRes.get();


Does it seem ok to you?

I guess it could be interesting to provide that MultipartBodyGeneratorFeeder class to AHC or Grizzly since some other people may want to achieve the same thing





2013/8/26 Ryan Lubke <[hidden email]>


Sébastien Lorber wrote:
Hello,

I would like to know if it's possible to upload a file with AHC / Grizzly in streaming, I mean without loading the whole file bytes in memory.

The default behavior seems to allocate a byte[] which contans the whole file, so it means that my server can be OOM if too many users upload a large file in the same time.


I've tryied with a Heap and ByteBuffer memory managers, with reallocate=true/false but no more success.

It seems the whole file content is appended wto the BufferOutputStream, and then the underlying buffer is written.

At least this seems to be the case with AHC integration:
https://github.com/AsyncHttpClient/async-http-client/blob/6faf1f316e5546110b0779a5a42fd9d03ba6bc15/providers/grizzly/src/main/java/org/asynchttpclient/providers/grizzly/bodyhandler/PartsBodyHandler.java


So, is there a way to patch AHC to stream the file so that I could eventually consume only 20mo of heap while uploading a 500mo file?
Or is this simply impossible with Grizzly?
I didn't notice anything related to that in the documentation.
It's possible with the FeedableBodyGenerator.  But if you're tied to using Multipart uploads,  you'd have to convert the multipart data to Buffers manually and send using the FeedableBodyGenerator.
I'll take a closer look to see if this area can be improved.


Btw in my case it is a file upload. I receive a file with CXF and have to transmit it to a storage server (like S3). CXF doesn't consume memory bevause it is streaming the large fle uploads to the file system, and then provides an input stream on that file.

Thanks




Reply | Threaded
Open this post in threaded view
|

Re: Upload a large file without oom with Grizzly

Ryan Lubke-2
Excellent!  Looking forward to the pull request!

Sébastien Lorber wrote:
Ryan thanks, it works fine, I'll make a pull request on AHC tomorrow with a better code using the same Part classes that already exist.

I created an OutputStream that redirects to the BodyGenerator feeder.

The problem I currently have is that the feeder feeds the queue faster than the async thread polling it :)
I need to expose a limit to that queue size or something, will work on that, it will be better than a thread sleep to slow down the filepart reading


2013/8/27 Ryan Lubke <[hidden email]>
Yes, something like that.  I was going to tackle adding something like this today.  I'll follow up with something you can test out.


Sébastien Lorber wrote:
Ok thanks!

I think I see what I could do, probably something like that:


    FeedableBodyGenerator bodyGenerator = new FeedableBodyGenerator();
    MultipartBodyGeneratorFeeder bodyGeneratorFeeder = new MultipartBodyGeneratorFeeder(bodyGenerator);
    Request uploadRequest1 = new RequestBuilder("POST")
            .setUrl("url")
            .setBody(bodyGenerator)
            .build();

    ListenableFuture<Response> asyncRes = asyncHttpClient
        .prepareRequest(uploadRequest1)
        .execute(new AsyncCompletionHandlerBase());


    bodyGeneratorFeeder.append("param1","value1");
    bodyGeneratorFeeder.append("param2","value2");
    bodyGeneratorFeeder.append("fileToUpload",fileInputStream);
    bodyGeneratorFeeder.end();

    Response uploadResponse = asyncRes.get();


Does it seem ok to you?

I guess it could be interesting to provide that MultipartBodyGeneratorFeeder class to AHC or Grizzly since some other people may want to achieve the same thing





2013/8/26 Ryan Lubke <[hidden email]>


Sébastien Lorber wrote:
Hello,

I would like to know if it's possible to upload a file with AHC / Grizzly in streaming, I mean without loading the whole file bytes in memory.

The default behavior seems to allocate a byte[] which contans the whole file, so it means that my server can be OOM if too many users upload a large file in the same time.


I've tryied with a Heap and ByteBuffer memory managers, with reallocate=true/false but no more success.

It seems the whole file content is appended wto the BufferOutputStream, and then the underlying buffer is written.

At least this seems to be the case with AHC integration:
https://github.com/AsyncHttpClient/async-http-client/blob/6faf1f316e5546110b0779a5a42fd9d03ba6bc15/providers/grizzly/src/main/java/org/asynchttpclient/providers/grizzly/bodyhandler/PartsBodyHandler.java


So, is there a way to patch AHC to stream the file so that I could eventually consume only 20mo of heap while uploading a 500mo file?
Or is this simply impossible with Grizzly?
I didn't notice anything related to that in the documentation.
It's possible with the FeedableBodyGenerator.  But if you're tied to using Multipart uploads,  you'd have to convert the multipart data to Buffers manually and send using the FeedableBodyGenerator.
I'll take a closer look to see if this area can be improved.


Btw in my case it is a file upload. I receive a file with CXF and have to transmit it to a storage server (like S3). CXF doesn't consume memory bevause it is streaming the large fle uploads to the file system, and then provides an input stream on that file.

Thanks




Reply | Threaded
Open this post in threaded view
|

Re: Upload a large file without oom with Grizzly

Sébastien Lorber
Hello,

I've made my pull request.

With my usecase it works, the file is uploaded like before.



But I didn't notice a big memory improvement.

Is it possible that SSL doesn't allow to stream the body or something like that?



In memory, I have a lot of:
- HeapByteBuffer
Which are hold by SSLUtils$3
Which are hold by BufferBuffers
Which are hold by WriteResult
Which are hold by AsyncWriteQueueRecord


Here is an exemple of the OOM stacktrace:

java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
at org.glassfish.grizzly.ssl.SSLUtils.allocateOutputBuffer(SSLUtils.java:342)
at org.glassfish.grizzly.ssl.SSLBaseFilter$2.grow(SSLBaseFilter.java:117)
at org.glassfish.grizzly.ssl.SSLConnectionContext.ensureBufferSize(SSLConnectionContext.java:392)
at org.glassfish.grizzly.ssl.SSLConnectionContext.wrap(SSLConnectionContext.java:272)
at org.glassfish.grizzly.ssl.SSLConnectionContext.wrapAll(SSLConnectionContext.java:227)
at org.glassfish.grizzly.ssl.SSLBaseFilter.wrapAll(SSLBaseFilter.java:404)
at org.glassfish.grizzly.ssl.SSLBaseFilter.handleWrite(SSLBaseFilter.java:319)
at org.glassfish.grizzly.ssl.SSLFilter.accurateWrite(SSLFilter.java:255)
at org.glassfish.grizzly.ssl.SSLFilter.handleWrite(SSLFilter.java:143)
at com.ning.http.client.providers.grizzly.GrizzlyAsyncHttpProvider$SwitchingSSLFilter.handleWrite(GrizzlyAsyncHttpProvider.java:2503)
at org.glassfish.grizzly.filterchain.ExecutorResolver$8.execute(ExecutorResolver.java:111)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:288)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:206)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:136)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:114)
at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:77)
at org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:853)
at org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:720)
at com.ning.http.client.providers.grizzly.FeedableBodyGenerator.flushQueue(FeedableBodyGenerator.java:132)
at com.ning.http.client.providers.grizzly.FeedableBodyGenerator.feed(FeedableBodyGenerator.java:101)
at com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder$FeedBodyGeneratorOutputStream.write(MultipartBodyGeneratorFeeder.java:222)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at com.ning.http.multipart.FilePart.sendData(FilePart.java:179)
at com.ning.http.multipart.Part.send(Part.java:331)
at com.ning.http.multipart.Part.sendParts(Part.java:397)
at com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder.feed(MultipartBodyGeneratorFeeder.java:144)




Any idea?



2013/8/27 Ryan Lubke <[hidden email]>
Excellent!  Looking forward to the pull request!


Sébastien Lorber wrote:
Ryan thanks, it works fine, I'll make a pull request on AHC tomorrow with a better code using the same Part classes that already exist.

I created an OutputStream that redirects to the BodyGenerator feeder.

The problem I currently have is that the feeder feeds the queue faster than the async thread polling it :)
I need to expose a limit to that queue size or something, will work on that, it will be better than a thread sleep to slow down the filepart reading


2013/8/27 Ryan Lubke <[hidden email]>
Yes, something like that.  I was going to tackle adding something like this today.  I'll follow up with something you can test out.


Sébastien Lorber wrote:
Ok thanks!

I think I see what I could do, probably something like that:


    FeedableBodyGenerator bodyGenerator = new FeedableBodyGenerator();
    MultipartBodyGeneratorFeeder bodyGeneratorFeeder = new MultipartBodyGeneratorFeeder(bodyGenerator);
    Request uploadRequest1 = new RequestBuilder("POST")
            .setUrl("url")
            .setBody(bodyGenerator)
            .build();

    ListenableFuture<Response> asyncRes = asyncHttpClient
        .prepareRequest(uploadRequest1)
        .execute(new AsyncCompletionHandlerBase());


    bodyGeneratorFeeder.append("param1","value1");
    bodyGeneratorFeeder.append("param2","value2");
    bodyGeneratorFeeder.append("fileToUpload",fileInputStream);
    bodyGeneratorFeeder.end();

    Response uploadResponse = asyncRes.get();


Does it seem ok to you?

I guess it could be interesting to provide that MultipartBodyGeneratorFeeder class to AHC or Grizzly since some other people may want to achieve the same thing





2013/8/26 Ryan Lubke <[hidden email]>


Sébastien Lorber wrote:
Hello,

I would like to know if it's possible to upload a file with AHC / Grizzly in streaming, I mean without loading the whole file bytes in memory.

The default behavior seems to allocate a byte[] which contans the whole file, so it means that my server can be OOM if too many users upload a large file in the same time.


I've tryied with a Heap and ByteBuffer memory managers, with reallocate=true/false but no more success.

It seems the whole file content is appended wto the BufferOutputStream, and then the underlying buffer is written.

At least this seems to be the case with AHC integration:
https://github.com/AsyncHttpClient/async-http-client/blob/6faf1f316e5546110b0779a5a42fd9d03ba6bc15/providers/grizzly/src/main/java/org/asynchttpclient/providers/grizzly/bodyhandler/PartsBodyHandler.java


So, is there a way to patch AHC to stream the file so that I could eventually consume only 20mo of heap while uploading a 500mo file?
Or is this simply impossible with Grizzly?
I didn't notice anything related to that in the documentation.
It's possible with the FeedableBodyGenerator.  But if you're tied to using Multipart uploads,  you'd have to convert the multipart data to Buffers manually and send using the FeedableBodyGenerator.
I'll take a closer look to see if this area can be improved.


Btw in my case it is a file upload. I receive a file with CXF and have to transmit it to a storage server (like S3). CXF doesn't consume memory bevause it is streaming the large fle uploads to the file system, and then provides an input stream on that file.

Thanks





Reply | Threaded
Open this post in threaded view
|

Re: Upload a large file without oom with Grizzly

Ryan Lubke-2
I'll be reviewing the PR today, thanks again!

Regarding the OOM: as it stands now, for each new buffer that is passed to the SSLFilter, we allocate a buffer twice the size in order to
accommodate the encrypted result.  So there's an increase.

Depending on the socket configurations of both endpoints, and how fast the remote is reading data, it could
be the write queue is becoming too large.   We do have a way to detect this situation, but I'm pretty sure
the Grizzly internals are currently shielded here.  I will see what I can do to allow users to leverage this.



Sébastien Lorber wrote:
Hello,

I've made my pull request.

With my usecase it works, the file is uploaded like before.



But I didn't notice a big memory improvement.

Is it possible that SSL doesn't allow to stream the body or something like that?



In memory, I have a lot of:
- HeapByteBuffer
Which are hold by SSLUtils$3
Which are hold by BufferBuffers
Which are hold by WriteResult
Which are hold by AsyncWriteQueueRecord


Here is an exemple of the OOM stacktrace:

java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
at org.glassfish.grizzly.ssl.SSLUtils.allocateOutputBuffer(SSLUtils.java:342)
at org.glassfish.grizzly.ssl.SSLBaseFilter$2.grow(SSLBaseFilter.java:117)
at org.glassfish.grizzly.ssl.SSLConnectionContext.ensureBufferSize(SSLConnectionContext.java:392)
at org.glassfish.grizzly.ssl.SSLConnectionContext.wrap(SSLConnectionContext.java:272)
at org.glassfish.grizzly.ssl.SSLConnectionContext.wrapAll(SSLConnectionContext.java:227)
at org.glassfish.grizzly.ssl.SSLBaseFilter.wrapAll(SSLBaseFilter.java:404)
at org.glassfish.grizzly.ssl.SSLBaseFilter.handleWrite(SSLBaseFilter.java:319)
at org.glassfish.grizzly.ssl.SSLFilter.accurateWrite(SSLFilter.java:255)
at org.glassfish.grizzly.ssl.SSLFilter.handleWrite(SSLFilter.java:143)
at com.ning.http.client.providers.grizzly.GrizzlyAsyncHttpProvider$SwitchingSSLFilter.handleWrite(GrizzlyAsyncHttpProvider.java:2503)
at org.glassfish.grizzly.filterchain.ExecutorResolver$8.execute(ExecutorResolver.java:111)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:288)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:206)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:136)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:114)
at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:77)
at org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:853)
at org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:720)
at com.ning.http.client.providers.grizzly.FeedableBodyGenerator.flushQueue(FeedableBodyGenerator.java:132)
at com.ning.http.client.providers.grizzly.FeedableBodyGenerator.feed(FeedableBodyGenerator.java:101)
at com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder$FeedBodyGeneratorOutputStream.write(MultipartBodyGeneratorFeeder.java:222)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at com.ning.http.multipart.FilePart.sendData(FilePart.java:179)
at com.ning.http.multipart.Part.send(Part.java:331)
at com.ning.http.multipart.Part.sendParts(Part.java:397)
at com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder.feed(MultipartBodyGeneratorFeeder.java:144)




Any idea?



2013/8/27 Ryan Lubke <[hidden email]>
Excellent!  Looking forward to the pull request!


Sébastien Lorber wrote:
Ryan thanks, it works fine, I'll make a pull request on AHC tomorrow with a better code using the same Part classes that already exist.

I created an OutputStream that redirects to the BodyGenerator feeder.

The problem I currently have is that the feeder feeds the queue faster than the async thread polling it :)
I need to expose a limit to that queue size or something, will work on that, it will be better than a thread sleep to slow down the filepart reading


2013/8/27 Ryan Lubke <[hidden email]>
Yes, something like that.  I was going to tackle adding something like this today.  I'll follow up with something you can test out.


Sébastien Lorber wrote:
Ok thanks!

I think I see what I could do, probably something like that:


    FeedableBodyGenerator bodyGenerator = new FeedableBodyGenerator();
    MultipartBodyGeneratorFeeder bodyGeneratorFeeder = new MultipartBodyGeneratorFeeder(bodyGenerator);
    Request uploadRequest1 = new RequestBuilder("POST")
            .setUrl("url")
            .setBody(bodyGenerator)
            .build();

    ListenableFuture<Response> asyncRes = asyncHttpClient
        .prepareRequest(uploadRequest1)
        .execute(new AsyncCompletionHandlerBase());


    bodyGeneratorFeeder.append("param1","value1");
    bodyGeneratorFeeder.append("param2","value2");
    bodyGeneratorFeeder.append("fileToUpload",fileInputStream);
    bodyGeneratorFeeder.end();

    Response uploadResponse = asyncRes.get();


Does it seem ok to you?

I guess it could be interesting to provide that MultipartBodyGeneratorFeeder class to AHC or Grizzly since some other people may want to achieve the same thing





2013/8/26 Ryan Lubke <[hidden email]>


Sébastien Lorber wrote:
Hello,

I would like to know if it's possible to upload a file with AHC / Grizzly in streaming, I mean without loading the whole file bytes in memory.

The default behavior seems to allocate a byte[] which contans the whole file, so it means that my server can be OOM if too many users upload a large file in the same time.


I've tryied with a Heap and ByteBuffer memory managers, with reallocate=true/false but no more success.

It seems the whole file content is appended wto the BufferOutputStream, and then the underlying buffer is written.

At least this seems to be the case with AHC integration:
https://github.com/AsyncHttpClient/async-http-client/blob/6faf1f316e5546110b0779a5a42fd9d03ba6bc15/providers/grizzly/src/main/java/org/asynchttpclient/providers/grizzly/bodyhandler/PartsBodyHandler.java


So, is there a way to patch AHC to stream the file so that I could eventually consume only 20mo of heap while uploading a 500mo file?
Or is this simply impossible with Grizzly?
I didn't notice anything related to that in the documentation.
It's possible with the FeedableBodyGenerator.  But if you're tied to using Multipart uploads,  you'd have to convert the multipart data to Buffers manually and send using the FeedableBodyGenerator.
I'll take a closer look to see if this area can be improved.


Btw in my case it is a file upload. I receive a file with CXF and have to transmit it to a storage server (like S3). CXF doesn't consume memory bevause it is streaming the large fle uploads to the file system, and then provides an input stream on that file.

Thanks





Reply | Threaded
Open this post in threaded view
|

Re: Upload a large file without oom with Grizzly

Sébastien Lorber
I have seen a lot of buffers which have a size of 33842 and it seems the limit is near half the capacity.

Perhaps there's a way to tune that buffer size so that it consumes less memory?
Is there an ideal Buffer size to send to the feed method?


2013/8/28 Ryan Lubke <[hidden email]>
I'll be reviewing the PR today, thanks again!

Regarding the OOM: as it stands now, for each new buffer that is passed to the SSLFilter, we allocate a buffer twice the size in order to
accommodate the encrypted result.  So there's an increase.

Depending on the socket configurations of both endpoints, and how fast the remote is reading data, it could
be the write queue is becoming too large.   We do have a way to detect this situation, but I'm pretty sure
the Grizzly internals are currently shielded here.  I will see what I can do to allow users to leverage this.




Sébastien Lorber wrote:
Hello,

I've made my pull request.

With my usecase it works, the file is uploaded like before.



But I didn't notice a big memory improvement.

Is it possible that SSL doesn't allow to stream the body or something like that?



In memory, I have a lot of:
- HeapByteBuffer
Which are hold by SSLUtils$3
Which are hold by BufferBuffers
Which are hold by WriteResult
Which are hold by AsyncWriteQueueRecord


Here is an exemple of the OOM stacktrace:

java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
at org.glassfish.grizzly.ssl.SSLUtils.allocateOutputBuffer(SSLUtils.java:342)
at org.glassfish.grizzly.ssl.SSLBaseFilter$2.grow(SSLBaseFilter.java:117)
at org.glassfish.grizzly.ssl.SSLConnectionContext.ensureBufferSize(SSLConnectionContext.java:392)
at org.glassfish.grizzly.ssl.SSLConnectionContext.wrap(SSLConnectionContext.java:272)
at org.glassfish.grizzly.ssl.SSLConnectionContext.wrapAll(SSLConnectionContext.java:227)
at org.glassfish.grizzly.ssl.SSLBaseFilter.wrapAll(SSLBaseFilter.java:404)
at org.glassfish.grizzly.ssl.SSLBaseFilter.handleWrite(SSLBaseFilter.java:319)
at org.glassfish.grizzly.ssl.SSLFilter.accurateWrite(SSLFilter.java:255)
at org.glassfish.grizzly.ssl.SSLFilter.handleWrite(SSLFilter.java:143)
at com.ning.http.client.providers.grizzly.GrizzlyAsyncHttpProvider$SwitchingSSLFilter.handleWrite(GrizzlyAsyncHttpProvider.java:2503)
at org.glassfish.grizzly.filterchain.ExecutorResolver$8.execute(ExecutorResolver.java:111)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:288)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:206)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:136)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:114)
at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:77)
at org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:853)
at org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:720)
at com.ning.http.client.providers.grizzly.FeedableBodyGenerator.flushQueue(FeedableBodyGenerator.java:132)
at com.ning.http.client.providers.grizzly.FeedableBodyGenerator.feed(FeedableBodyGenerator.java:101)
at com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder$FeedBodyGeneratorOutputStream.write(MultipartBodyGeneratorFeeder.java:222)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at com.ning.http.multipart.FilePart.sendData(FilePart.java:179)
at com.ning.http.multipart.Part.send(Part.java:331)
at com.ning.http.multipart.Part.sendParts(Part.java:397)
at com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder.feed(MultipartBodyGeneratorFeeder.java:144)




Any idea?



2013/8/27 Ryan Lubke <[hidden email]>
Excellent!  Looking forward to the pull request!


Sébastien Lorber wrote:
Ryan thanks, it works fine, I'll make a pull request on AHC tomorrow with a better code using the same Part classes that already exist.

I created an OutputStream that redirects to the BodyGenerator feeder.

The problem I currently have is that the feeder feeds the queue faster than the async thread polling it :)
I need to expose a limit to that queue size or something, will work on that, it will be better than a thread sleep to slow down the filepart reading


2013/8/27 Ryan Lubke <[hidden email]>
Yes, something like that.  I was going to tackle adding something like this today.  I'll follow up with something you can test out.


Sébastien Lorber wrote:
Ok thanks!

I think I see what I could do, probably something like that:


    FeedableBodyGenerator bodyGenerator = new FeedableBodyGenerator();
    MultipartBodyGeneratorFeeder bodyGeneratorFeeder = new MultipartBodyGeneratorFeeder(bodyGenerator);
    Request uploadRequest1 = new RequestBuilder("POST")
            .setUrl("url")
            .setBody(bodyGenerator)
            .build();

    ListenableFuture<Response> asyncRes = asyncHttpClient
        .prepareRequest(uploadRequest1)
        .execute(new AsyncCompletionHandlerBase());


    bodyGeneratorFeeder.append("param1","value1");
    bodyGeneratorFeeder.append("param2","value2");
    bodyGeneratorFeeder.append("fileToUpload",fileInputStream);
    bodyGeneratorFeeder.end();

    Response uploadResponse = asyncRes.get();


Does it seem ok to you?

I guess it could be interesting to provide that MultipartBodyGeneratorFeeder class to AHC or Grizzly since some other people may want to achieve the same thing





2013/8/26 Ryan Lubke <[hidden email]>


Sébastien Lorber wrote:
Hello,

I would like to know if it's possible to upload a file with AHC / Grizzly in streaming, I mean without loading the whole file bytes in memory.

The default behavior seems to allocate a byte[] which contans the whole file, so it means that my server can be OOM if too many users upload a large file in the same time.


I've tryied with a Heap and ByteBuffer memory managers, with reallocate=true/false but no more success.

It seems the whole file content is appended wto the BufferOutputStream, and then the underlying buffer is written.

At least this seems to be the case with AHC integration:
https://github.com/AsyncHttpClient/async-http-client/blob/6faf1f316e5546110b0779a5a42fd9d03ba6bc15/providers/grizzly/src/main/java/org/asynchttpclient/providers/grizzly/bodyhandler/PartsBodyHandler.java


So, is there a way to patch AHC to stream the file so that I could eventually consume only 20mo of heap while uploading a 500mo file?
Or is this simply impossible with Grizzly?
I didn't notice anything related to that in the documentation.
It's possible with the FeedableBodyGenerator.  But if you're tied to using Multipart uploads,  you'd have to convert the multipart data to Buffers manually and send using the FeedableBodyGenerator.
I'll take a closer look to see if this area can be improved.


Btw in my case it is a file upload. I receive a file with CXF and have to transmit it to a storage server (like S3). CXF doesn't consume memory bevause it is streaming the large fle uploads to the file system, and then provides an input stream on that file.

Thanks






Reply | Threaded
Open this post in threaded view
|

Re: Upload a large file without oom with Grizzly

Ryan Lubke-2
At this point in time, as far as the SSL buffer allocation is concerned, it's untunable.

That said, feel free to open a feature request.

As to your second question, there is no suggested size.  This is all very application specific.

I'm curious, how large of a file are you sending?


Sébastien Lorber wrote:
I have seen a lot of buffers which have a size of 33842 and it seems the limit is near half the capacity.

Perhaps there's a way to tune that buffer size so that it consumes less memory?
Is there an ideal Buffer size to send to the feed method?


2013/8/28 Ryan Lubke <[hidden email]>
I'll be reviewing the PR today, thanks again!

Regarding the OOM: as it stands now, for each new buffer that is passed to the SSLFilter, we allocate a buffer twice the size in order to
accommodate the encrypted result.  So there's an increase.

Depending on the socket configurations of both endpoints, and how fast the remote is reading data, it could
be the write queue is becoming too large.   We do have a way to detect this situation, but I'm pretty sure
the Grizzly internals are currently shielded here.  I will see what I can do to allow users to leverage this.




Sébastien Lorber wrote:
Hello,

I've made my pull request.

With my usecase it works, the file is uploaded like before.



But I didn't notice a big memory improvement.

Is it possible that SSL doesn't allow to stream the body or something like that?



In memory, I have a lot of:
- HeapByteBuffer
Which are hold by SSLUtils$3
Which are hold by BufferBuffers
Which are hold by WriteResult
Which are hold by AsyncWriteQueueRecord


Here is an exemple of the OOM stacktrace:

java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
at org.glassfish.grizzly.ssl.SSLUtils.allocateOutputBuffer(SSLUtils.java:342)
at org.glassfish.grizzly.ssl.SSLBaseFilter$2.grow(SSLBaseFilter.java:117)
at org.glassfish.grizzly.ssl.SSLConnectionContext.ensureBufferSize(SSLConnectionContext.java:392)
at org.glassfish.grizzly.ssl.SSLConnectionContext.wrap(SSLConnectionContext.java:272)
at org.glassfish.grizzly.ssl.SSLConnectionContext.wrapAll(SSLConnectionContext.java:227)
at org.glassfish.grizzly.ssl.SSLBaseFilter.wrapAll(SSLBaseFilter.java:404)
at org.glassfish.grizzly.ssl.SSLBaseFilter.handleWrite(SSLBaseFilter.java:319)
at org.glassfish.grizzly.ssl.SSLFilter.accurateWrite(SSLFilter.java:255)
at org.glassfish.grizzly.ssl.SSLFilter.handleWrite(SSLFilter.java:143)
at com.ning.http.client.providers.grizzly.GrizzlyAsyncHttpProvider$SwitchingSSLFilter.handleWrite(GrizzlyAsyncHttpProvider.java:2503)
at org.glassfish.grizzly.filterchain.ExecutorResolver$8.execute(ExecutorResolver.java:111)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:288)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:206)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:136)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:114)
at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:77)
at org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:853)
at org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:720)
at com.ning.http.client.providers.grizzly.FeedableBodyGenerator.flushQueue(FeedableBodyGenerator.java:132)
at com.ning.http.client.providers.grizzly.FeedableBodyGenerator.feed(FeedableBodyGenerator.java:101)
at com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder$FeedBodyGeneratorOutputStream.write(MultipartBodyGeneratorFeeder.java:222)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at com.ning.http.multipart.FilePart.sendData(FilePart.java:179)
at com.ning.http.multipart.Part.send(Part.java:331)
at com.ning.http.multipart.Part.sendParts(Part.java:397)
at com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder.feed(MultipartBodyGeneratorFeeder.java:144)




Any idea?



2013/8/27 Ryan Lubke <[hidden email]>
Excellent!  Looking forward to the pull request!


Sébastien Lorber wrote:
Ryan thanks, it works fine, I'll make a pull request on AHC tomorrow with a better code using the same Part classes that already exist.

I created an OutputStream that redirects to the BodyGenerator feeder.

The problem I currently have is that the feeder feeds the queue faster than the async thread polling it :)
I need to expose a limit to that queue size or something, will work on that, it will be better than a thread sleep to slow down the filepart reading


2013/8/27 Ryan Lubke <[hidden email]>
Yes, something like that.  I was going to tackle adding something like this today.  I'll follow up with something you can test out.


Sébastien Lorber wrote:
Ok thanks!

I think I see what I could do, probably something like that:


    FeedableBodyGenerator bodyGenerator = new FeedableBodyGenerator();
    MultipartBodyGeneratorFeeder bodyGeneratorFeeder = new MultipartBodyGeneratorFeeder(bodyGenerator);
    Request uploadRequest1 = new RequestBuilder("POST")
            .setUrl("url")
            .setBody(bodyGenerator)
            .build();

    ListenableFuture<Response> asyncRes = asyncHttpClient
        .prepareRequest(uploadRequest1)
        .execute(new AsyncCompletionHandlerBase());


    bodyGeneratorFeeder.append("param1","value1");
    bodyGeneratorFeeder.append("param2","value2");
    bodyGeneratorFeeder.append("fileToUpload",fileInputStream);
    bodyGeneratorFeeder.end();

    Response uploadResponse = asyncRes.get();


Does it seem ok to you?

I guess it could be interesting to provide that MultipartBodyGeneratorFeeder class to AHC or Grizzly since some other people may want to achieve the same thing





2013/8/26 Ryan Lubke <[hidden email]>


Sébastien Lorber wrote:
Hello,

I would like to know if it's possible to upload a file with AHC / Grizzly in streaming, I mean without loading the whole file bytes in memory.

The default behavior seems to allocate a byte[] which contans the whole file, so it means that my server can be OOM if too many users upload a large file in the same time.


I've tryied with a Heap and ByteBuffer memory managers, with reallocate=true/false but no more success.

It seems the whole file content is appended wto the BufferOutputStream, and then the underlying buffer is written.

At least this seems to be the case with AHC integration:
https://github.com/AsyncHttpClient/async-http-client/blob/6faf1f316e5546110b0779a5a42fd9d03ba6bc15/providers/grizzly/src/main/java/org/asynchttpclient/providers/grizzly/bodyhandler/PartsBodyHandler.java


So, is there a way to patch AHC to stream the file so that I could eventually consume only 20mo of heap while uploading a 500mo file?
Or is this simply impossible with Grizzly?
I didn't notice anything related to that in the documentation.
It's possible with the FeedableBodyGenerator.  But if you're tied to using Multipart uploads,  you'd have to convert the multipart data to Buffers manually and send using the FeedableBodyGenerator.
I'll take a closer look to see if this area can be improved.


Btw in my case it is a file upload. I receive a file with CXF and have to transmit it to a storage server (like S3). CXF doesn't consume memory bevause it is streaming the large fle uploads to the file system, and then provides an input stream on that file.

Thanks






Reply | Threaded
Open this post in threaded view
|

Re: Upload a large file without oom with Grizzly

Sébastien Lorber

I'm trying to send a 500m file for my tests with a heap of 400m.

In our real use cases we would probably have files under 20mo but we want to reduce the memory consumption because we can have x parallel uploads on the same server according to the user activity.

I'll try to check if using this BodyGenerator reduced the memory footprint or if it's almost like before.


2013/8/28 Ryan Lubke <[hidden email]>
At this point in time, as far as the SSL buffer allocation is concerned, it's untunable.

That said, feel free to open a feature request.

As to your second question, there is no suggested size.  This is all very application specific.

I'm curious, how large of a file are you sending?



Sébastien Lorber wrote:
I have seen a lot of buffers which have a size of 33842 and it seems the limit is near half the capacity.

Perhaps there's a way to tune that buffer size so that it consumes less memory?
Is there an ideal Buffer size to send to the feed method?


2013/8/28 Ryan Lubke <[hidden email]>
I'll be reviewing the PR today, thanks again!

Regarding the OOM: as it stands now, for each new buffer that is passed to the SSLFilter, we allocate a buffer twice the size in order to
accommodate the encrypted result.  So there's an increase.

Depending on the socket configurations of both endpoints, and how fast the remote is reading data, it could
be the write queue is becoming too large.   We do have a way to detect this situation, but I'm pretty sure
the Grizzly internals are currently shielded here.  I will see what I can do to allow users to leverage this.




Sébastien Lorber wrote:
Hello,

I've made my pull request.

With my usecase it works, the file is uploaded like before.



But I didn't notice a big memory improvement.

Is it possible that SSL doesn't allow to stream the body or something like that?



In memory, I have a lot of:
- HeapByteBuffer
Which are hold by SSLUtils$3
Which are hold by BufferBuffers
Which are hold by WriteResult
Which are hold by AsyncWriteQueueRecord


Here is an exemple of the OOM stacktrace:

java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
at org.glassfish.grizzly.ssl.SSLUtils.allocateOutputBuffer(SSLUtils.java:342)
at org.glassfish.grizzly.ssl.SSLBaseFilter$2.grow(SSLBaseFilter.java:117)
at org.glassfish.grizzly.ssl.SSLConnectionContext.ensureBufferSize(SSLConnectionContext.java:392)
at org.glassfish.grizzly.ssl.SSLConnectionContext.wrap(SSLConnectionContext.java:272)
at org.glassfish.grizzly.ssl.SSLConnectionContext.wrapAll(SSLConnectionContext.java:227)
at org.glassfish.grizzly.ssl.SSLBaseFilter.wrapAll(SSLBaseFilter.java:404)
at org.glassfish.grizzly.ssl.SSLBaseFilter.handleWrite(SSLBaseFilter.java:319)
at org.glassfish.grizzly.ssl.SSLFilter.accurateWrite(SSLFilter.java:255)
at org.glassfish.grizzly.ssl.SSLFilter.handleWrite(SSLFilter.java:143)
at com.ning.http.client.providers.grizzly.GrizzlyAsyncHttpProvider$SwitchingSSLFilter.handleWrite(GrizzlyAsyncHttpProvider.java:2503)
at org.glassfish.grizzly.filterchain.ExecutorResolver$8.execute(ExecutorResolver.java:111)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:288)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:206)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:136)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:114)
at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:77)
at org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:853)
at org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:720)
at com.ning.http.client.providers.grizzly.FeedableBodyGenerator.flushQueue(FeedableBodyGenerator.java:132)
at com.ning.http.client.providers.grizzly.FeedableBodyGenerator.feed(FeedableBodyGenerator.java:101)
at com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder$FeedBodyGeneratorOutputStream.write(MultipartBodyGeneratorFeeder.java:222)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at com.ning.http.multipart.FilePart.sendData(FilePart.java:179)
at com.ning.http.multipart.Part.send(Part.java:331)
at com.ning.http.multipart.Part.sendParts(Part.java:397)
at com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder.feed(MultipartBodyGeneratorFeeder.java:144)




Any idea?



2013/8/27 Ryan Lubke <[hidden email]>
Excellent!  Looking forward to the pull request!


Sébastien Lorber wrote:
Ryan thanks, it works fine, I'll make a pull request on AHC tomorrow with a better code using the same Part classes that already exist.

I created an OutputStream that redirects to the BodyGenerator feeder.

The problem I currently have is that the feeder feeds the queue faster than the async thread polling it :)
I need to expose a limit to that queue size or something, will work on that, it will be better than a thread sleep to slow down the filepart reading


2013/8/27 Ryan Lubke <[hidden email]>
Yes, something like that.  I was going to tackle adding something like this today.  I'll follow up with something you can test out.


Sébastien Lorber wrote:
Ok thanks!

I think I see what I could do, probably something like that:


    FeedableBodyGenerator bodyGenerator = new FeedableBodyGenerator();
    MultipartBodyGeneratorFeeder bodyGeneratorFeeder = new MultipartBodyGeneratorFeeder(bodyGenerator);
    Request uploadRequest1 = new RequestBuilder("POST")
            .setUrl("url")
            .setBody(bodyGenerator)
            .build();

    ListenableFuture<Response> asyncRes = asyncHttpClient
        .prepareRequest(uploadRequest1)
        .execute(new AsyncCompletionHandlerBase());


    bodyGeneratorFeeder.append("param1","value1");
    bodyGeneratorFeeder.append("param2","value2");
    bodyGeneratorFeeder.append("fileToUpload",fileInputStream);
    bodyGeneratorFeeder.end();

    Response uploadResponse = asyncRes.get();


Does it seem ok to you?

I guess it could be interesting to provide that MultipartBodyGeneratorFeeder class to AHC or Grizzly since some other people may want to achieve the same thing





2013/8/26 Ryan Lubke <[hidden email]>


Sébastien Lorber wrote:
Hello,

I would like to know if it's possible to upload a file with AHC / Grizzly in streaming, I mean without loading the whole file bytes in memory.

The default behavior seems to allocate a byte[] which contans the whole file, so it means that my server can be OOM if too many users upload a large file in the same time.


I've tryied with a Heap and ByteBuffer memory managers, with reallocate=true/false but no more success.

It seems the whole file content is appended wto the BufferOutputStream, and then the underlying buffer is written.

At least this seems to be the case with AHC integration:
https://github.com/AsyncHttpClient/async-http-client/blob/6faf1f316e5546110b0779a5a42fd9d03ba6bc15/providers/grizzly/src/main/java/org/asynchttpclient/providers/grizzly/bodyhandler/PartsBodyHandler.java


So, is there a way to patch AHC to stream the file so that I could eventually consume only 20mo of heap while uploading a 500mo file?
Or is this simply impossible with Grizzly?
I didn't notice anything related to that in the documentation.
It's possible with the FeedableBodyGenerator.  But if you're tied to using Multipart uploads,  you'd have to convert the multipart data to Buffers manually and send using the FeedableBodyGenerator.
I'll take a closer look to see if this area can be improved.


Btw in my case it is a file upload. I receive a file with CXF and have to transmit it to a storage server (like S3). CXF doesn't consume memory bevause it is streaming the large fle uploads to the file system, and then provides an input stream on that file.

Thanks







Reply | Threaded
Open this post in threaded view
|

Re: Upload a large file without oom with Grizzly

Sébastien Lorber


By chance do you if I can remove the MessageCloner used in the SSL filter?
SSLBaseFilter$OnWriteCopyCloner

It seems to allocate a lot of memory.
I don't really understand why messages have to be cloned, can I remove this? How? 


2013/8/29 Sébastien Lorber <[hidden email]>

I'm trying to send a 500m file for my tests with a heap of 400m.

In our real use cases we would probably have files under 20mo but we want to reduce the memory consumption because we can have x parallel uploads on the same server according to the user activity.

I'll try to check if using this BodyGenerator reduced the memory footprint or if it's almost like before.


2013/8/28 Ryan Lubke <[hidden email]>
At this point in time, as far as the SSL buffer allocation is concerned, it's untunable.

That said, feel free to open a feature request.

As to your second question, there is no suggested size.  This is all very application specific.

I'm curious, how large of a file are you sending?



Sébastien Lorber wrote:
I have seen a lot of buffers which have a size of 33842 and it seems the limit is near half the capacity.

Perhaps there's a way to tune that buffer size so that it consumes less memory?
Is there an ideal Buffer size to send to the feed method?


2013/8/28 Ryan Lubke <[hidden email]>
I'll be reviewing the PR today, thanks again!

Regarding the OOM: as it stands now, for each new buffer that is passed to the SSLFilter, we allocate a buffer twice the size in order to
accommodate the encrypted result.  So there's an increase.

Depending on the socket configurations of both endpoints, and how fast the remote is reading data, it could
be the write queue is becoming too large.   We do have a way to detect this situation, but I'm pretty sure
the Grizzly internals are currently shielded here.  I will see what I can do to allow users to leverage this.




Sébastien Lorber wrote:
Hello,

I've made my pull request.

With my usecase it works, the file is uploaded like before.



But I didn't notice a big memory improvement.

Is it possible that SSL doesn't allow to stream the body or something like that?



In memory, I have a lot of:
- HeapByteBuffer
Which are hold by SSLUtils$3
Which are hold by BufferBuffers
Which are hold by WriteResult
Which are hold by AsyncWriteQueueRecord


Here is an exemple of the OOM stacktrace:

java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
at org.glassfish.grizzly.ssl.SSLUtils.allocateOutputBuffer(SSLUtils.java:342)
at org.glassfish.grizzly.ssl.SSLBaseFilter$2.grow(SSLBaseFilter.java:117)
at org.glassfish.grizzly.ssl.SSLConnectionContext.ensureBufferSize(SSLConnectionContext.java:392)
at org.glassfish.grizzly.ssl.SSLConnectionContext.wrap(SSLConnectionContext.java:272)
at org.glassfish.grizzly.ssl.SSLConnectionContext.wrapAll(SSLConnectionContext.java:227)
at org.glassfish.grizzly.ssl.SSLBaseFilter.wrapAll(SSLBaseFilter.java:404)
at org.glassfish.grizzly.ssl.SSLBaseFilter.handleWrite(SSLBaseFilter.java:319)
at org.glassfish.grizzly.ssl.SSLFilter.accurateWrite(SSLFilter.java:255)
at org.glassfish.grizzly.ssl.SSLFilter.handleWrite(SSLFilter.java:143)
at com.ning.http.client.providers.grizzly.GrizzlyAsyncHttpProvider$SwitchingSSLFilter.handleWrite(GrizzlyAsyncHttpProvider.java:2503)
at org.glassfish.grizzly.filterchain.ExecutorResolver$8.execute(ExecutorResolver.java:111)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:288)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:206)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:136)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:114)
at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:77)
at org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:853)
at org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:720)
at com.ning.http.client.providers.grizzly.FeedableBodyGenerator.flushQueue(FeedableBodyGenerator.java:132)
at com.ning.http.client.providers.grizzly.FeedableBodyGenerator.feed(FeedableBodyGenerator.java:101)
at com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder$FeedBodyGeneratorOutputStream.write(MultipartBodyGeneratorFeeder.java:222)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at com.ning.http.multipart.FilePart.sendData(FilePart.java:179)
at com.ning.http.multipart.Part.send(Part.java:331)
at com.ning.http.multipart.Part.sendParts(Part.java:397)
at com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder.feed(MultipartBodyGeneratorFeeder.java:144)




Any idea?



2013/8/27 Ryan Lubke <[hidden email]>
Excellent!  Looking forward to the pull request!


Sébastien Lorber wrote:
Ryan thanks, it works fine, I'll make a pull request on AHC tomorrow with a better code using the same Part classes that already exist.

I created an OutputStream that redirects to the BodyGenerator feeder.

The problem I currently have is that the feeder feeds the queue faster than the async thread polling it :)
I need to expose a limit to that queue size or something, will work on that, it will be better than a thread sleep to slow down the filepart reading


2013/8/27 Ryan Lubke <[hidden email]>
Yes, something like that.  I was going to tackle adding something like this today.  I'll follow up with something you can test out.


Sébastien Lorber wrote:
Ok thanks!

I think I see what I could do, probably something like that:


    FeedableBodyGenerator bodyGenerator = new FeedableBodyGenerator();
    MultipartBodyGeneratorFeeder bodyGeneratorFeeder = new MultipartBodyGeneratorFeeder(bodyGenerator);
    Request uploadRequest1 = new RequestBuilder("POST")
            .setUrl("url")
            .setBody(bodyGenerator)
            .build();

    ListenableFuture<Response> asyncRes = asyncHttpClient
        .prepareRequest(uploadRequest1)
        .execute(new AsyncCompletionHandlerBase());


    bodyGeneratorFeeder.append("param1","value1");
    bodyGeneratorFeeder.append("param2","value2");
    bodyGeneratorFeeder.append("fileToUpload",fileInputStream);
    bodyGeneratorFeeder.end();

    Response uploadResponse = asyncRes.get();


Does it seem ok to you?

I guess it could be interesting to provide that MultipartBodyGeneratorFeeder class to AHC or Grizzly since some other people may want to achieve the same thing





2013/8/26 Ryan Lubke <[hidden email]>


Sébastien Lorber wrote:
Hello,

I would like to know if it's possible to upload a file with AHC / Grizzly in streaming, I mean without loading the whole file bytes in memory.

The default behavior seems to allocate a byte[] which contans the whole file, so it means that my server can be OOM if too many users upload a large file in the same time.


I've tryied with a Heap and ByteBuffer memory managers, with reallocate=true/false but no more success.

It seems the whole file content is appended wto the BufferOutputStream, and then the underlying buffer is written.

At least this seems to be the case with AHC integration:
https://github.com/AsyncHttpClient/async-http-client/blob/6faf1f316e5546110b0779a5a42fd9d03ba6bc15/providers/grizzly/src/main/java/org/asynchttpclient/providers/grizzly/bodyhandler/PartsBodyHandler.java


So, is there a way to patch AHC to stream the file so that I could eventually consume only 20mo of heap while uploading a 500mo file?
Or is this simply impossible with Grizzly?
I didn't notice anything related to that in the documentation.
It's possible with the FeedableBodyGenerator.  But if you're tied to using Multipart uploads,  you'd have to convert the multipart data to Buffers manually and send using the FeedableBodyGenerator.
I'll take a closer look to see if this area can be improved.


Btw in my case it is a file upload. I receive a file with CXF and have to transmit it to a storage server (like S3). CXF doesn't consume memory bevause it is streaming the large fle uploads to the file system, and then provides an input stream on that file.

Thanks








Reply | Threaded
Open this post in threaded view
|

Re: Upload a large file without oom with Grizzly

Sébastien Lorber
Ryan, I've did some other tests.


It seems that using a blocking queue in the FeedableBodyGenerator is totally useless because the thread consuming it is not blocking and the queue never blocks the feeding, which was my intention in the beginning. Maybe it depends on the IO strategy used? 
I use AHC default which seems to use SameThreadIOStrategy so I don't think it's related to the IO strategy.


So in the end I can upload a 70m file with a heap of 50m, but I have to put a Thread.sleep(30) between each 100k Buffer send to the FeedableBodyGenerator

The connection with the server is not good here, but in production it is normally a lot better as far as I know.



I've tried things like clientTransport.getAsyncQueueIO().getWriter().setMaxPendingBytesPerConnection(100000); but it doesn't seem to work for me.

I'd like the Grizzly internals to block when there are too much pending bytes to send. Is it possible?




PS: I've just been able to send a 500mo file with 100mo heap, but it needed a sleep of 100ms between each 100k Buffer sent to the bodygenerator




2013/8/29 Sébastien Lorber <[hidden email]>


By chance do you if I can remove the MessageCloner used in the SSL filter?
SSLBaseFilter$OnWriteCopyCloner

It seems to allocate a lot of memory.
I don't really understand why messages have to be cloned, can I remove this? How? 


2013/8/29 Sébastien Lorber <[hidden email]>

I'm trying to send a 500m file for my tests with a heap of 400m.

In our real use cases we would probably have files under 20mo but we want to reduce the memory consumption because we can have x parallel uploads on the same server according to the user activity.

I'll try to check if using this BodyGenerator reduced the memory footprint or if it's almost like before.


2013/8/28 Ryan Lubke <[hidden email]>
At this point in time, as far as the SSL buffer allocation is concerned, it's untunable.

That said, feel free to open a feature request.

As to your second question, there is no suggested size.  This is all very application specific.

I'm curious, how large of a file are you sending?



Sébastien Lorber wrote:
I have seen a lot of buffers which have a size of 33842 and it seems the limit is near half the capacity.

Perhaps there's a way to tune that buffer size so that it consumes less memory?
Is there an ideal Buffer size to send to the feed method?


2013/8/28 Ryan Lubke <[hidden email]>
I'll be reviewing the PR today, thanks again!

Regarding the OOM: as it stands now, for each new buffer that is passed to the SSLFilter, we allocate a buffer twice the size in order to
accommodate the encrypted result.  So there's an increase.

Depending on the socket configurations of both endpoints, and how fast the remote is reading data, it could
be the write queue is becoming too large.   We do have a way to detect this situation, but I'm pretty sure
the Grizzly internals are currently shielded here.  I will see what I can do to allow users to leverage this.




Sébastien Lorber wrote:
Hello,

I've made my pull request.

With my usecase it works, the file is uploaded like before.



But I didn't notice a big memory improvement.

Is it possible that SSL doesn't allow to stream the body or something like that?



In memory, I have a lot of:
- HeapByteBuffer
Which are hold by SSLUtils$3
Which are hold by BufferBuffers
Which are hold by WriteResult
Which are hold by AsyncWriteQueueRecord


Here is an exemple of the OOM stacktrace:

java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
at org.glassfish.grizzly.ssl.SSLUtils.allocateOutputBuffer(SSLUtils.java:342)
at org.glassfish.grizzly.ssl.SSLBaseFilter$2.grow(SSLBaseFilter.java:117)
at org.glassfish.grizzly.ssl.SSLConnectionContext.ensureBufferSize(SSLConnectionContext.java:392)
at org.glassfish.grizzly.ssl.SSLConnectionContext.wrap(SSLConnectionContext.java:272)
at org.glassfish.grizzly.ssl.SSLConnectionContext.wrapAll(SSLConnectionContext.java:227)
at org.glassfish.grizzly.ssl.SSLBaseFilter.wrapAll(SSLBaseFilter.java:404)
at org.glassfish.grizzly.ssl.SSLBaseFilter.handleWrite(SSLBaseFilter.java:319)
at org.glassfish.grizzly.ssl.SSLFilter.accurateWrite(SSLFilter.java:255)
at org.glassfish.grizzly.ssl.SSLFilter.handleWrite(SSLFilter.java:143)
at com.ning.http.client.providers.grizzly.GrizzlyAsyncHttpProvider$SwitchingSSLFilter.handleWrite(GrizzlyAsyncHttpProvider.java:2503)
at org.glassfish.grizzly.filterchain.ExecutorResolver$8.execute(ExecutorResolver.java:111)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:288)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:206)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:136)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:114)
at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:77)
at org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:853)
at org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:720)
at com.ning.http.client.providers.grizzly.FeedableBodyGenerator.flushQueue(FeedableBodyGenerator.java:132)
at com.ning.http.client.providers.grizzly.FeedableBodyGenerator.feed(FeedableBodyGenerator.java:101)
at com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder$FeedBodyGeneratorOutputStream.write(MultipartBodyGeneratorFeeder.java:222)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at com.ning.http.multipart.FilePart.sendData(FilePart.java:179)
at com.ning.http.multipart.Part.send(Part.java:331)
at com.ning.http.multipart.Part.sendParts(Part.java:397)
at com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder.feed(MultipartBodyGeneratorFeeder.java:144)




Any idea?



2013/8/27 Ryan Lubke <[hidden email]>
Excellent!  Looking forward to the pull request!


Sébastien Lorber wrote:
Ryan thanks, it works fine, I'll make a pull request on AHC tomorrow with a better code using the same Part classes that already exist.

I created an OutputStream that redirects to the BodyGenerator feeder.

The problem I currently have is that the feeder feeds the queue faster than the async thread polling it :)
I need to expose a limit to that queue size or something, will work on that, it will be better than a thread sleep to slow down the filepart reading


2013/8/27 Ryan Lubke <[hidden email]>
Yes, something like that.  I was going to tackle adding something like this today.  I'll follow up with something you can test out.


Sébastien Lorber wrote:
Ok thanks!

I think I see what I could do, probably something like that:


    FeedableBodyGenerator bodyGenerator = new FeedableBodyGenerator();
    MultipartBodyGeneratorFeeder bodyGeneratorFeeder = new MultipartBodyGeneratorFeeder(bodyGenerator);
    Request uploadRequest1 = new RequestBuilder("POST")
            .setUrl("url")
            .setBody(bodyGenerator)
            .build();

    ListenableFuture<Response> asyncRes = asyncHttpClient
        .prepareRequest(uploadRequest1)
        .execute(new AsyncCompletionHandlerBase());


    bodyGeneratorFeeder.append("param1","value1");
    bodyGeneratorFeeder.append("param2","value2");
    bodyGeneratorFeeder.append("fileToUpload",fileInputStream);
    bodyGeneratorFeeder.end();

    Response uploadResponse = asyncRes.get();


Does it seem ok to you?

I guess it could be interesting to provide that MultipartBodyGeneratorFeeder class to AHC or Grizzly since some other people may want to achieve the same thing





2013/8/26 Ryan Lubke <[hidden email]>


Sébastien Lorber wrote:
Hello,

I would like to know if it's possible to upload a file with AHC / Grizzly in streaming, I mean without loading the whole file bytes in memory.

The default behavior seems to allocate a byte[] which contans the whole file, so it means that my server can be OOM if too many users upload a large file in the same time.


I've tryied with a Heap and ByteBuffer memory managers, with reallocate=true/false but no more success.

It seems the whole file content is appended wto the BufferOutputStream, and then the underlying buffer is written.

At least this seems to be the case with AHC integration:
https://github.com/AsyncHttpClient/async-http-client/blob/6faf1f316e5546110b0779a5a42fd9d03ba6bc15/providers/grizzly/src/main/java/org/asynchttpclient/providers/grizzly/bodyhandler/PartsBodyHandler.java


So, is there a way to patch AHC to stream the file so that I could eventually consume only 20mo of heap while uploading a 500mo file?
Or is this simply impossible with Grizzly?
I didn't notice anything related to that in the documentation.
It's possible with the FeedableBodyGenerator.  But if you're tied to using Multipart uploads,  you'd have to convert the multipart data to Buffers manually and send using the FeedableBodyGenerator.
I'll take a closer look to see if this area can be improved.


Btw in my case it is a file upload. I receive a file with CXF and have to transmit it to a storage server (like S3). CXF doesn't consume memory bevause it is streaming the large fle uploads to the file system, and then provides an input stream on that file.

Thanks









Reply | Threaded
Open this post in threaded view
|

Re: Upload a large file without oom with Grizzly

Ryan Lubke-2
Sébastien,

I'd recommend looking at Connection.canWrite() [1] and Connection.notifyCanWrite(WriteListener) [1]

By default, Grizzly will configure the async write queue length to be four times the write buffer size (which is based off the socket write buffer).
When this queue exceeds this value, canWrite() will return false. 

When this occurs, you can register a WriteListener to be notified when the queue length is below the configured max and then simulate blocking
until the onWritePossible() callback has been invoked.

----------------------------------------------------------------

        final FutureImpl<Boolean> future = Futures.createSafeFuture();
       
        // Connection may be obtained by calling FilterChainContext.getConnection().
        connection.notifyCanWrite(new WriteHandler() {

            @Override
            public void onWritePossible() throws Exception {
                future.result(Boolean.TRUE);
            }

            @Override
            public void onError(Throwable t) {
                future.failure(Exceptions.makeIOException(t));
            }
        });
       
        try {
            final long writeTimeout = 30;
            future.get(writeTimeout, TimeUnit.MILLISECONDS);
        } catch (ExecutionException e) {
            HttpTransactionContext httpCtx = HttpTransactionContext.get(connection);
            httpCtx.abort(e.getCause());
        } catch (Exception e) {
           
HttpTransactionContext httpCtx = HttpTransactionContext.get(connection);
            httpCtx.abort(e);

        }

---------------------------------------------------------------------

[1] http://grizzly.java.net/docs/2.3/apidocs/org/glassfish/grizzly/OutputSink.html.

Sébastien  Lorber wrote:
Ryan, I've did some other tests.


It seems that using a blocking queue in the FeedableBodyGenerator is totally useless because the thread consuming it is not blocking and the queue never blocks the feeding, which was my intention in the beginning. Maybe it depends on the IO strategy used? 
I use AHC default which seems to use SameThreadIOStrategy so I don't think it's related to the IO strategy.


So in the end I can upload a 70m file with a heap of 50m, but I have to put a Thread.sleep(30) between each 100k Buffer send to the FeedableBodyGenerator

The connection with the server is not good here, but in production it is normally a lot better as far as I know.



I've tried things like clientTransport.getAsyncQueueIO().getWriter().setMaxPendingBytesPerConnection(100000); but it doesn't seem to work for me.

I'd like the Grizzly internals to block when there are too much pending bytes to send. Is it possible?




PS: I've just been able to send a 500mo file with 100mo heap, but it needed a sleep of 100ms between each 100k Buffer sent to the bodygenerator




2013/8/29 Sébastien Lorber <[hidden email]>


By chance do you if I can remove the MessageCloner used in the SSL filter?
SSLBaseFilter$OnWriteCopyCloner

It seems to allocate a lot of memory.
I don't really understand why messages have to be cloned, can I remove this? How? 


2013/8/29 Sébastien Lorber <[hidden email]>

I'm trying to send a 500m file for my tests with a heap of 400m.

In our real use cases we would probably have files under 20mo but we want to reduce the memory consumption because we can have x parallel uploads on the same server according to the user activity.

I'll try to check if using this BodyGenerator reduced the memory footprint or if it's almost like before.


2013/8/28 Ryan Lubke <[hidden email]>
At this point in time, as far as the SSL buffer allocation is concerned, it's untunable.

That said, feel free to open a feature request.

As to your second question, there is no suggested size.  This is all very application specific.

I'm curious, how large of a file are you sending?



Sébastien Lorber wrote:
I have seen a lot of buffers which have a size of 33842 and it seems the limit is near half the capacity.

Perhaps there's a way to tune that buffer size so that it consumes less memory?
Is there an ideal Buffer size to send to the feed method?


2013/8/28 Ryan Lubke <[hidden email]>
I'll be reviewing the PR today, thanks again!

Regarding the OOM: as it stands now, for each new buffer that is passed to the SSLFilter, we allocate a buffer twice the size in order to
accommodate the encrypted result.  So there's an increase.

Depending on the socket configurations of both endpoints, and how fast the remote is reading data, it could
be the write queue is becoming too large.   We do have a way to detect this situation, but I'm pretty sure
the Grizzly internals are currently shielded here.  I will see what I can do to allow users to leverage this.




Sébastien Lorber wrote:
Hello,

I've made my pull request.

With my usecase it works, the file is uploaded like before.



But I didn't notice a big memory improvement.

Is it possible that SSL doesn't allow to stream the body or something like that?



In memory, I have a lot of:
- HeapByteBuffer
Which are hold by SSLUtils$3
Which are hold by BufferBuffers
Which are hold by WriteResult
Which are hold by AsyncWriteQueueRecord


Here is an exemple of the OOM stacktrace:

java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
at org.glassfish.grizzly.ssl.SSLUtils.allocateOutputBuffer(SSLUtils.java:342)
at org.glassfish.grizzly.ssl.SSLBaseFilter$2.grow(SSLBaseFilter.java:117)
at org.glassfish.grizzly.ssl.SSLConnectionContext.ensureBufferSize(SSLConnectionContext.java:392)
at org.glassfish.grizzly.ssl.SSLConnectionContext.wrap(SSLConnectionContext.java:272)
at org.glassfish.grizzly.ssl.SSLConnectionContext.wrapAll(SSLConnectionContext.java:227)
at org.glassfish.grizzly.ssl.SSLBaseFilter.wrapAll(SSLBaseFilter.java:404)
at org.glassfish.grizzly.ssl.SSLBaseFilter.handleWrite(SSLBaseFilter.java:319)
at org.glassfish.grizzly.ssl.SSLFilter.accurateWrite(SSLFilter.java:255)
at org.glassfish.grizzly.ssl.SSLFilter.handleWrite(SSLFilter.java:143)
at com.ning.http.client.providers.grizzly.GrizzlyAsyncHttpProvider$SwitchingSSLFilter.handleWrite(GrizzlyAsyncHttpProvider.java:2503)
at org.glassfish.grizzly.filterchain.ExecutorResolver$8.execute(ExecutorResolver.java:111)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:288)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:206)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:136)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:114)
at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:77)
at org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:853)
at org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:720)
at com.ning.http.client.providers.grizzly.FeedableBodyGenerator.flushQueue(FeedableBodyGenerator.java:132)
at com.ning.http.client.providers.grizzly.FeedableBodyGenerator.feed(FeedableBodyGenerator.java:101)
at com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder$FeedBodyGeneratorOutputStream.write(MultipartBodyGeneratorFeeder.java:222)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at com.ning.http.multipart.FilePart.sendData(FilePart.java:179)
at com.ning.http.multipart.Part.send(Part.java:331)
at com.ning.http.multipart.Part.sendParts(Part.java:397)
at com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder.feed(MultipartBodyGeneratorFeeder.java:144)




Any idea?



2013/8/27 Ryan Lubke <[hidden email]>
Excellent!  Looking forward to the pull request!


Sébastien Lorber wrote:
Ryan thanks, it works fine, I'll make a pull request on AHC tomorrow with a better code using the same Part classes that already exist.

I created an OutputStream that redirects to the BodyGenerator feeder.

The problem I currently have is that the feeder feeds the queue faster than the async thread polling it :)
I need to expose a limit to that queue size or something, will work on that, it will be better than a thread sleep to slow down the filepart reading


2013/8/27 Ryan Lubke <[hidden email]>
Yes, something like that.  I was going to tackle adding something like this today.  I'll follow up with something you can test out.


Sébastien Lorber wrote:
Ok thanks!

I think I see what I could do, probably something like that:


    FeedableBodyGenerator bodyGenerator = new FeedableBodyGenerator();
    MultipartBodyGeneratorFeeder bodyGeneratorFeeder = new MultipartBodyGeneratorFeeder(bodyGenerator);
    Request uploadRequest1 = new RequestBuilder("POST")
            .setUrl("url")
            .setBody(bodyGenerator)
            .build();

    ListenableFuture<Response> asyncRes = asyncHttpClient
        .prepareRequest(uploadRequest1)
        .execute(new AsyncCompletionHandlerBase());


    bodyGeneratorFeeder.append("param1","value1");
    bodyGeneratorFeeder.append("param2","value2");
    bodyGeneratorFeeder.append("fileToUpload",fileInputStream);
    bodyGeneratorFeeder.end();

    Response uploadResponse = asyncRes.get();


Does it seem ok to you?

I guess it could be interesting to provide that MultipartBodyGeneratorFeeder class to AHC or Grizzly since some other people may want to achieve the same thing





2013/8/26 Ryan Lubke <[hidden email]>


Sébastien Lorber wrote:
Hello,

I would like to know if it's possible to upload a file with AHC / Grizzly in streaming, I mean without loading the whole file bytes in memory.

The default behavior seems to allocate a byte[] which contans the whole file, so it means that my server can be OOM if too many users upload a large file in the same time.


I've tryied with a Heap and ByteBuffer memory managers, with reallocate=true/false but no more success.

It seems the whole file content is appended wto the BufferOutputStream, and then the underlying buffer is written.

At least this seems to be the case with AHC integration:
https://github.com/AsyncHttpClient/async-http-client/blob/6faf1f316e5546110b0779a5a42fd9d03ba6bc15/providers/grizzly/src/main/java/org/asynchttpclient/providers/grizzly/bodyhandler/PartsBodyHandler.java


So, is there a way to patch AHC to stream the file so that I could eventually consume only 20mo of heap while uploading a 500mo file?
Or is this simply impossible with Grizzly?
I didn't notice anything related to that in the documentation.
It's possible with the FeedableBodyGenerator.  But if you're tied to using Multipart uploads,  you'd have to convert the multipart data to Buffers manually and send using the FeedableBodyGenerator.
I'll take a closer look to see if this area can be improved.


Btw in my case it is a file upload. I receive a file with CXF and have to transmit it to a storage server (like S3). CXF doesn't consume memory bevause it is streaming the large fle uploads to the file system, and then provides an input stream on that file.

Thanks









Reply | Threaded
Open this post in threaded view
|

Re: Upload a large file without oom with Grizzly

Ryan Lubke-2
In reply to this post by Sébastien Lorber
Probably not a good idea.  The cloner will be invoked if the write queue can't write the message immediately and needs to add the message to the queue.  To is an added measure of safety as the upstream buffer could be re-used or recycled.


Sébastien Lorber wrote:


By chance do you if I can remove the MessageCloner used in the SSL filter?
SSLBaseFilter$OnWriteCopyCloner

It seems to allocate a lot of memory.
I don't really understand why messages have to be cloned, can I remove this? How? 


2013/8/29 Sébastien Lorber <[hidden email]>

I'm trying to send a 500m file for my tests with a heap of 400m.

In our real use cases we would probably have files under 20mo but we want to reduce the memory consumption because we can have x parallel uploads on the same server according to the user activity.

I'll try to check if using this BodyGenerator reduced the memory footprint or if it's almost like before.


2013/8/28 Ryan Lubke <[hidden email]>
At this point in time, as far as the SSL buffer allocation is concerned, it's untunable.

That said, feel free to open a feature request.

As to your second question, there is no suggested size.  This is all very application specific.

I'm curious, how large of a file are you sending?



Sébastien Lorber wrote:
I have seen a lot of buffers which have a size of 33842 and it seems the limit is near half the capacity.

Perhaps there's a way to tune that buffer size so that it consumes less memory?
Is there an ideal Buffer size to send to the feed method?


2013/8/28 Ryan Lubke <[hidden email]>
I'll be reviewing the PR today, thanks again!

Regarding the OOM: as it stands now, for each new buffer that is passed to the SSLFilter, we allocate a buffer twice the size in order to
accommodate the encrypted result.  So there's an increase.

Depending on the socket configurations of both endpoints, and how fast the remote is reading data, it could
be the write queue is becoming too large.   We do have a way to detect this situation, but I'm pretty sure
the Grizzly internals are currently shielded here.  I will see what I can do to allow users to leverage this.




Sébastien Lorber wrote:
Hello,

I've made my pull request.

With my usecase it works, the file is uploaded like before.



But I didn't notice a big memory improvement.

Is it possible that SSL doesn't allow to stream the body or something like that?



In memory, I have a lot of:
- HeapByteBuffer
Which are hold by SSLUtils$3
Which are hold by BufferBuffers
Which are hold by WriteResult
Which are hold by AsyncWriteQueueRecord


Here is an exemple of the OOM stacktrace:

java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
at org.glassfish.grizzly.ssl.SSLUtils.allocateOutputBuffer(SSLUtils.java:342)
at org.glassfish.grizzly.ssl.SSLBaseFilter$2.grow(SSLBaseFilter.java:117)
at org.glassfish.grizzly.ssl.SSLConnectionContext.ensureBufferSize(SSLConnectionContext.java:392)
at org.glassfish.grizzly.ssl.SSLConnectionContext.wrap(SSLConnectionContext.java:272)
at org.glassfish.grizzly.ssl.SSLConnectionContext.wrapAll(SSLConnectionContext.java:227)
at org.glassfish.grizzly.ssl.SSLBaseFilter.wrapAll(SSLBaseFilter.java:404)
at org.glassfish.grizzly.ssl.SSLBaseFilter.handleWrite(SSLBaseFilter.java:319)
at org.glassfish.grizzly.ssl.SSLFilter.accurateWrite(SSLFilter.java:255)
at org.glassfish.grizzly.ssl.SSLFilter.handleWrite(SSLFilter.java:143)
at com.ning.http.client.providers.grizzly.GrizzlyAsyncHttpProvider$SwitchingSSLFilter.handleWrite(GrizzlyAsyncHttpProvider.java:2503)
at org.glassfish.grizzly.filterchain.ExecutorResolver$8.execute(ExecutorResolver.java:111)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:288)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:206)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:136)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:114)
at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:77)
at org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:853)
at org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:720)
at com.ning.http.client.providers.grizzly.FeedableBodyGenerator.flushQueue(FeedableBodyGenerator.java:132)
at com.ning.http.client.providers.grizzly.FeedableBodyGenerator.feed(FeedableBodyGenerator.java:101)
at com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder$FeedBodyGeneratorOutputStream.write(MultipartBodyGeneratorFeeder.java:222)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at com.ning.http.multipart.FilePart.sendData(FilePart.java:179)
at com.ning.http.multipart.Part.send(Part.java:331)
at com.ning.http.multipart.Part.sendParts(Part.java:397)
at com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder.feed(MultipartBodyGeneratorFeeder.java:144)




Any idea?



2013/8/27 Ryan Lubke <[hidden email]>
Excellent!  Looking forward to the pull request!


Sébastien Lorber wrote:
Ryan thanks, it works fine, I'll make a pull request on AHC tomorrow with a better code using the same Part classes that already exist.

I created an OutputStream that redirects to the BodyGenerator feeder.

The problem I currently have is that the feeder feeds the queue faster than the async thread polling it :)
I need to expose a limit to that queue size or something, will work on that, it will be better than a thread sleep to slow down the filepart reading


2013/8/27 Ryan Lubke <[hidden email]>
Yes, something like that.  I was going to tackle adding something like this today.  I'll follow up with something you can test out.


Sébastien Lorber wrote:
Ok thanks!

I think I see what I could do, probably something like that:


    FeedableBodyGenerator bodyGenerator = new FeedableBodyGenerator();
    MultipartBodyGeneratorFeeder bodyGeneratorFeeder = new MultipartBodyGeneratorFeeder(bodyGenerator);
    Request uploadRequest1 = new RequestBuilder("POST")
            .setUrl("url")
            .setBody(bodyGenerator)
            .build();

    ListenableFuture<Response> asyncRes = asyncHttpClient
        .prepareRequest(uploadRequest1)
        .execute(new AsyncCompletionHandlerBase());


    bodyGeneratorFeeder.append("param1","value1");
    bodyGeneratorFeeder.append("param2","value2");
    bodyGeneratorFeeder.append("fileToUpload",fileInputStream);
    bodyGeneratorFeeder.end();

    Response uploadResponse = asyncRes.get();


Does it seem ok to you?

I guess it could be interesting to provide that MultipartBodyGeneratorFeeder class to AHC or Grizzly since some other people may want to achieve the same thing





2013/8/26 Ryan Lubke <[hidden email]>


Sébastien Lorber wrote:
Hello,

I would like to know if it's possible to upload a file with AHC / Grizzly in streaming, I mean without loading the whole file bytes in memory.

The default behavior seems to allocate a byte[] which contans the whole file, so it means that my server can be OOM if too many users upload a large file in the same time.


I've tryied with a Heap and ByteBuffer memory managers, with reallocate=true/false but no more success.

It seems the whole file content is appended wto the BufferOutputStream, and then the underlying buffer is written.

At least this seems to be the case with AHC integration:
https://github.com/AsyncHttpClient/async-http-client/blob/6faf1f316e5546110b0779a5a42fd9d03ba6bc15/providers/grizzly/src/main/java/org/asynchttpclient/providers/grizzly/bodyhandler/PartsBodyHandler.java


So, is there a way to patch AHC to stream the file so that I could eventually consume only 20mo of heap while uploading a 500mo file?
Or is this simply impossible with Grizzly?
I didn't notice anything related to that in the documentation.
It's possible with the FeedableBodyGenerator.  But if you're tied to using Multipart uploads,  you'd have to convert the multipart data to Buffers manually and send using the FeedableBodyGenerator.
I'll take a closer look to see if this area can be improved.


Btw in my case it is a file upload. I receive a file with CXF and have to transmit it to a storage server (like S3). CXF doesn't consume memory bevause it is streaming the large fle uploads to the file system, and then provides an input stream on that file.

Thanks








Reply | Threaded
Open this post in threaded view
|

Re: Upload a large file without oom with Grizzly

Ryan Lubke-2
In reply to this post by Ryan Lubke-2
Sébastien,

Could you also provide a sample of how you're performing your feed? 

Thanks,
-rl

Ryan Lubke wrote:
Sébastien,

I'd recommend looking at Connection.canWrite() [1] and Connection.notifyCanWrite(WriteListener) [1]

By default, Grizzly will configure the async write queue length to be four times the write buffer size (which is based off the socket write buffer).
When this queue exceeds this value, canWrite() will return false. 

When this occurs, you can register a WriteListener to be notified when the queue length is below the configured max and then simulate blocking
until the onWritePossible() callback has been invoked.

----------------------------------------------------------------

        final FutureImpl<Boolean> future = Futures.createSafeFuture();
       
        // Connection may be obtained by calling FilterChainContext.getConnection().
        connection.notifyCanWrite(new WriteHandler() {

            @Override
            public void onWritePossible() throws Exception {
                future.result(Boolean.TRUE);
            }

            @Override
            public void onError(Throwable t) {
                future.failure(Exceptions.makeIOException(t));
            }
        });
       
        try {
            final long writeTimeout = 30;
            future.get(writeTimeout, TimeUnit.MILLISECONDS);
        } catch (ExecutionException e) {
            HttpTransactionContext httpCtx = HttpTransactionContext.get(connection);
            httpCtx.abort(e.getCause());
        } catch (Exception e) {
           
HttpTransactionContext httpCtx = HttpTransactionContext.get(connection);
            httpCtx.abort(e);

        }

---------------------------------------------------------------------

[1] http://grizzly.java.net/docs/2.3/apidocs/org/glassfish/grizzly/OutputSink.html.

Sébastien  Lorber wrote:
Ryan, I've did some other tests.


It seems that using a blocking queue in the FeedableBodyGenerator is totally useless because the thread consuming it is not blocking and the queue never blocks the feeding, which was my intention in the beginning. Maybe it depends on the IO strategy used? 
I use AHC default which seems to use SameThreadIOStrategy so I don't think it's related to the IO strategy.


So in the end I can upload a 70m file with a heap of 50m, but I have to put a Thread.sleep(30) between each 100k Buffer send to the FeedableBodyGenerator

The connection with the server is not good here, but in production it is normally a lot better as far as I know.



I've tried things like clientTransport.getAsyncQueueIO().getWriter().setMaxPendingBytesPerConnection(100000); but it doesn't seem to work for me.

I'd like the Grizzly internals to block when there are too much pending bytes to send. Is it possible?




PS: I've just been able to send a 500mo file with 100mo heap, but it needed a sleep of 100ms between each 100k Buffer sent to the bodygenerator




2013/8/29 Sébastien Lorber <[hidden email]>


By chance do you if I can remove the MessageCloner used in the SSL filter?
SSLBaseFilter$OnWriteCopyCloner

It seems to allocate a lot of memory.
I don't really understand why messages have to be cloned, can I remove this? How? 


2013/8/29 Sébastien Lorber <[hidden email]>

I'm trying to send a 500m file for my tests with a heap of 400m.

In our real use cases we would probably have files under 20mo but we want to reduce the memory consumption because we can have x parallel uploads on the same server according to the user activity.

I'll try to check if using this BodyGenerator reduced the memory footprint or if it's almost like before.


2013/8/28 Ryan Lubke <[hidden email]>
At this point in time, as far as the SSL buffer allocation is concerned, it's untunable.

That said, feel free to open a feature request.

As to your second question, there is no suggested size.  This is all very application specific.

I'm curious, how large of a file are you sending?



Sébastien Lorber wrote:
I have seen a lot of buffers which have a size of 33842 and it seems the limit is near half the capacity.

Perhaps there's a way to tune that buffer size so that it consumes less memory?
Is there an ideal Buffer size to send to the feed method?


2013/8/28 Ryan Lubke <[hidden email]>
I'll be reviewing the PR today, thanks again!

Regarding the OOM: as it stands now, for each new buffer that is passed to the SSLFilter, we allocate a buffer twice the size in order to
accommodate the encrypted result.  So there's an increase.

Depending on the socket configurations of both endpoints, and how fast the remote is reading data, it could
be the write queue is becoming too large.   We do have a way to detect this situation, but I'm pretty sure
the Grizzly internals are currently shielded here.  I will see what I can do to allow users to leverage this.




Sébastien Lorber wrote:
Hello,

I've made my pull request.

With my usecase it works, the file is uploaded like before.



But I didn't notice a big memory improvement.

Is it possible that SSL doesn't allow to stream the body or something like that?



In memory, I have a lot of:
- HeapByteBuffer
Which are hold by SSLUtils$3
Which are hold by BufferBuffers
Which are hold by WriteResult
Which are hold by AsyncWriteQueueRecord


Here is an exemple of the OOM stacktrace:

java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
at org.glassfish.grizzly.ssl.SSLUtils.allocateOutputBuffer(SSLUtils.java:342)
at org.glassfish.grizzly.ssl.SSLBaseFilter$2.grow(SSLBaseFilter.java:117)
at org.glassfish.grizzly.ssl.SSLConnectionContext.ensureBufferSize(SSLConnectionContext.java:392)
at org.glassfish.grizzly.ssl.SSLConnectionContext.wrap(SSLConnectionContext.java:272)
at org.glassfish.grizzly.ssl.SSLConnectionContext.wrapAll(SSLConnectionContext.java:227)
at org.glassfish.grizzly.ssl.SSLBaseFilter.wrapAll(SSLBaseFilter.java:404)
at org.glassfish.grizzly.ssl.SSLBaseFilter.handleWrite(SSLBaseFilter.java:319)
at org.glassfish.grizzly.ssl.SSLFilter.accurateWrite(SSLFilter.java:255)
at org.glassfish.grizzly.ssl.SSLFilter.handleWrite(SSLFilter.java:143)
at com.ning.http.client.providers.grizzly.GrizzlyAsyncHttpProvider$SwitchingSSLFilter.handleWrite(GrizzlyAsyncHttpProvider.java:2503)
at org.glassfish.grizzly.filterchain.ExecutorResolver$8.execute(ExecutorResolver.java:111)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:288)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:206)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:136)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:114)
at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:77)
at org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:853)
at org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:720)
at com.ning.http.client.providers.grizzly.FeedableBodyGenerator.flushQueue(FeedableBodyGenerator.java:132)
at com.ning.http.client.providers.grizzly.FeedableBodyGenerator.feed(FeedableBodyGenerator.java:101)
at com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder$FeedBodyGeneratorOutputStream.write(MultipartBodyGeneratorFeeder.java:222)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at com.ning.http.multipart.FilePart.sendData(FilePart.java:179)
at com.ning.http.multipart.Part.send(Part.java:331)
at com.ning.http.multipart.Part.sendParts(Part.java:397)
at com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder.feed(MultipartBodyGeneratorFeeder.java:144)




Any idea?



2013/8/27 Ryan Lubke <[hidden email]>
Excellent!  Looking forward to the pull request!


Sébastien Lorber wrote:
Ryan thanks, it works fine, I'll make a pull request on AHC tomorrow with a better code using the same Part classes that already exist.

I created an OutputStream that redirects to the BodyGenerator feeder.

The problem I currently have is that the feeder feeds the queue faster than the async thread polling it :)
I need to expose a limit to that queue size or something, will work on that, it will be better than a thread sleep to slow down the filepart reading


2013/8/27 Ryan Lubke <[hidden email]>
Yes, something like that.  I was going to tackle adding something like this today.  I'll follow up with something you can test out.


Sébastien Lorber wrote:
Ok thanks!

I think I see what I could do, probably something like that:


    FeedableBodyGenerator bodyGenerator = new FeedableBodyGenerator();
    MultipartBodyGeneratorFeeder bodyGeneratorFeeder = new MultipartBodyGeneratorFeeder(bodyGenerator);
    Request uploadRequest1 = new RequestBuilder("POST")
            .setUrl("url")
            .setBody(bodyGenerator)
            .build();

    ListenableFuture<Response> asyncRes = asyncHttpClient
        .prepareRequest(uploadRequest1)
        .execute(new AsyncCompletionHandlerBase());


    bodyGeneratorFeeder.append("param1","value1");
    bodyGeneratorFeeder.append("param2","value2");
    bodyGeneratorFeeder.append("fileToUpload",fileInputStream);
    bodyGeneratorFeeder.end();

    Response uploadResponse = asyncRes.get();


Does it seem ok to you?

I guess it could be interesting to provide that MultipartBodyGeneratorFeeder class to AHC or Grizzly since some other people may want to achieve the same thing





2013/8/26 Ryan Lubke <[hidden email]>


Sébastien Lorber wrote:
Hello,

I would like to know if it's possible to upload a file with AHC / Grizzly in streaming, I mean without loading the whole file bytes in memory.

The default behavior seems to allocate a byte[] which contans the whole file, so it means that my server can be OOM if too many users upload a large file in the same time.


I've tryied with a Heap and ByteBuffer memory managers, with reallocate=true/false but no more success.

It seems the whole file content is appended wto the BufferOutputStream, and then the underlying buffer is written.

At least this seems to be the case with AHC integration:
https://github.com/AsyncHttpClient/async-http-client/blob/6faf1f316e5546110b0779a5a42fd9d03ba6bc15/providers/grizzly/src/main/java/org/asynchttpclient/providers/grizzly/bodyhandler/PartsBodyHandler.java


So, is there a way to patch AHC to stream the file so that I could eventually consume only 20mo of heap while uploading a 500mo file?
Or is this simply impossible with Grizzly?
I didn't notice anything related to that in the documentation.
It's possible with the FeedableBodyGenerator.  But if you're tied to using Multipart uploads,  you'd have to convert the multipart data to Buffers manually and send using the FeedableBodyGenerator.
I'll take a closer look to see if this area can be improved.


Btw in my case it is a file upload. I receive a file with CXF and have to transmit it to a storage server (like S3). CXF doesn't consume memory bevause it is streaming the large fle uploads to the file system, and then provides an input stream on that file.

Thanks









Reply | Threaded
Open this post in threaded view
|

Re: Upload a large file without oom with Grizzly

Ryan Lubke-2
In reply to this post by Ryan Lubke-2
Sébastien,

I've committed some changes to FeedableBodyGenerator that may be of interest (they are based on the info below), so
you may want to do a pull to get up-to-date.

I'm now looking at providing a Grizzly-optimized extension of the multipart API.

Ryan Lubke wrote:
Sébastien,

I'd recommend looking at Connection.canWrite() [1] and Connection.notifyCanWrite(WriteListener) [1]

By default, Grizzly will configure the async write queue length to be four times the write buffer size (which is based off the socket write buffer).
When this queue exceeds this value, canWrite() will return false. 

When this occurs, you can register a WriteListener to be notified when the queue length is below the configured max and then simulate blocking
until the onWritePossible() callback has been invoked.

----------------------------------------------------------------

        final FutureImpl<Boolean> future = Futures.createSafeFuture();
       
        // Connection may be obtained by calling FilterChainContext.getConnection().
        connection.notifyCanWrite(new WriteHandler() {

            @Override
            public void onWritePossible() throws Exception {
                future.result(Boolean.TRUE);
            }

            @Override
            public void onError(Throwable t) {
                future.failure(Exceptions.makeIOException(t));
            }
        });
       
        try {
            final long writeTimeout = 30;
            future.get(writeTimeout, TimeUnit.MILLISECONDS);
        } catch (ExecutionException e) {
            HttpTransactionContext httpCtx = HttpTransactionContext.get(connection);
            httpCtx.abort(e.getCause());
        } catch (Exception e) {
           
HttpTransactionContext httpCtx = HttpTransactionContext.get(connection);
            httpCtx.abort(e);

        }

---------------------------------------------------------------------

[1] http://grizzly.java.net/docs/2.3/apidocs/org/glassfish/grizzly/OutputSink.html.

Sébastien  Lorber wrote:
Ryan, I've did some other tests.


It seems that using a blocking queue in the FeedableBodyGenerator is totally useless because the thread consuming it is not blocking and the queue never blocks the feeding, which was my intention in the beginning. Maybe it depends on the IO strategy used? 
I use AHC default which seems to use SameThreadIOStrategy so I don't think it's related to the IO strategy.


So in the end I can upload a 70m file with a heap of 50m, but I have to put a Thread.sleep(30) between each 100k Buffer send to the FeedableBodyGenerator

The connection with the server is not good here, but in production it is normally a lot better as far as I know.



I've tried things like clientTransport.getAsyncQueueIO().getWriter().setMaxPendingBytesPerConnection(100000); but it doesn't seem to work for me.

I'd like the Grizzly internals to block when there are too much pending bytes to send. Is it possible?




PS: I've just been able to send a 500mo file with 100mo heap, but it needed a sleep of 100ms between each 100k Buffer sent to the bodygenerator




2013/8/29 Sébastien Lorber <[hidden email]>


By chance do you if I can remove the MessageCloner used in the SSL filter?
SSLBaseFilter$OnWriteCopyCloner

It seems to allocate a lot of memory.
I don't really understand why messages have to be cloned, can I remove this? How? 


2013/8/29 Sébastien Lorber <[hidden email]>

I'm trying to send a 500m file for my tests with a heap of 400m.

In our real use cases we would probably have files under 20mo but we want to reduce the memory consumption because we can have x parallel uploads on the same server according to the user activity.

I'll try to check if using this BodyGenerator reduced the memory footprint or if it's almost like before.


2013/8/28 Ryan Lubke <[hidden email]>
At this point in time, as far as the SSL buffer allocation is concerned, it's untunable.

That said, feel free to open a feature request.

As to your second question, there is no suggested size.  This is all very application specific.

I'm curious, how large of a file are you sending?



Sébastien Lorber wrote:
I have seen a lot of buffers which have a size of 33842 and it seems the limit is near half the capacity.

Perhaps there's a way to tune that buffer size so that it consumes less memory?
Is there an ideal Buffer size to send to the feed method?


2013/8/28 Ryan Lubke <[hidden email]>
I'll be reviewing the PR today, thanks again!

Regarding the OOM: as it stands now, for each new buffer that is passed to the SSLFilter, we allocate a buffer twice the size in order to
accommodate the encrypted result.  So there's an increase.

Depending on the socket configurations of both endpoints, and how fast the remote is reading data, it could
be the write queue is becoming too large.   We do have a way to detect this situation, but I'm pretty sure
the Grizzly internals are currently shielded here.  I will see what I can do to allow users to leverage this.




Sébastien Lorber wrote:
Hello,

I've made my pull request.

With my usecase it works, the file is uploaded like before.



But I didn't notice a big memory improvement.

Is it possible that SSL doesn't allow to stream the body or something like that?



In memory, I have a lot of:
- HeapByteBuffer
Which are hold by SSLUtils$3
Which are hold by BufferBuffers
Which are hold by WriteResult
Which are hold by AsyncWriteQueueRecord


Here is an exemple of the OOM stacktrace:

java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
at org.glassfish.grizzly.ssl.SSLUtils.allocateOutputBuffer(SSLUtils.java:342)
at org.glassfish.grizzly.ssl.SSLBaseFilter$2.grow(SSLBaseFilter.java:117)
at org.glassfish.grizzly.ssl.SSLConnectionContext.ensureBufferSize(SSLConnectionContext.java:392)
at org.glassfish.grizzly.ssl.SSLConnectionContext.wrap(SSLConnectionContext.java:272)
at org.glassfish.grizzly.ssl.SSLConnectionContext.wrapAll(SSLConnectionContext.java:227)
at org.glassfish.grizzly.ssl.SSLBaseFilter.wrapAll(SSLBaseFilter.java:404)
at org.glassfish.grizzly.ssl.SSLBaseFilter.handleWrite(SSLBaseFilter.java:319)
at org.glassfish.grizzly.ssl.SSLFilter.accurateWrite(SSLFilter.java:255)
at org.glassfish.grizzly.ssl.SSLFilter.handleWrite(SSLFilter.java:143)
at com.ning.http.client.providers.grizzly.GrizzlyAsyncHttpProvider$SwitchingSSLFilter.handleWrite(GrizzlyAsyncHttpProvider.java:2503)
at org.glassfish.grizzly.filterchain.ExecutorResolver$8.execute(ExecutorResolver.java:111)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:288)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:206)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:136)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:114)
at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:77)
at org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:853)
at org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:720)
at com.ning.http.client.providers.grizzly.FeedableBodyGenerator.flushQueue(FeedableBodyGenerator.java:132)
at com.ning.http.client.providers.grizzly.FeedableBodyGenerator.feed(FeedableBodyGenerator.java:101)
at com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder$FeedBodyGeneratorOutputStream.write(MultipartBodyGeneratorFeeder.java:222)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at com.ning.http.multipart.FilePart.sendData(FilePart.java:179)
at com.ning.http.multipart.Part.send(Part.java:331)
at com.ning.http.multipart.Part.sendParts(Part.java:397)
at com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder.feed(MultipartBodyGeneratorFeeder.java:144)




Any idea?



2013/8/27 Ryan Lubke <[hidden email]>
Excellent!  Looking forward to the pull request!


Sébastien Lorber wrote:
Ryan thanks, it works fine, I'll make a pull request on AHC tomorrow with a better code using the same Part classes that already exist.

I created an OutputStream that redirects to the BodyGenerator feeder.

The problem I currently have is that the feeder feeds the queue faster than the async thread polling it :)
I need to expose a limit to that queue size or something, will work on that, it will be better than a thread sleep to slow down the filepart reading


2013/8/27 Ryan Lubke <[hidden email]>
Yes, something like that.  I was going to tackle adding something like this today.  I'll follow up with something you can test out.


Sébastien Lorber wrote:
Ok thanks!

I think I see what I could do, probably something like that:


    FeedableBodyGenerator bodyGenerator = new FeedableBodyGenerator();
    MultipartBodyGeneratorFeeder bodyGeneratorFeeder = new MultipartBodyGeneratorFeeder(bodyGenerator);
    Request uploadRequest1 = new RequestBuilder("POST")
            .setUrl("url")
            .setBody(bodyGenerator)
            .build();

    ListenableFuture<Response> asyncRes = asyncHttpClient
        .prepareRequest(uploadRequest1)
        .execute(new AsyncCompletionHandlerBase());


    bodyGeneratorFeeder.append("param1","value1");
    bodyGeneratorFeeder.append("param2","value2");
    bodyGeneratorFeeder.append("fileToUpload",fileInputStream);
    bodyGeneratorFeeder.end();

    Response uploadResponse = asyncRes.get();


Does it seem ok to you?

I guess it could be interesting to provide that MultipartBodyGeneratorFeeder class to AHC or Grizzly since some other people may want to achieve the same thing





2013/8/26 Ryan Lubke <[hidden email]>


Sébastien Lorber wrote:
Hello,

I would like to know if it's possible to upload a file with AHC / Grizzly in streaming, I mean without loading the whole file bytes in memory.

The default behavior seems to allocate a byte[] which contans the whole file, so it means that my server can be OOM if too many users upload a large file in the same time.


I've tryied with a Heap and ByteBuffer memory managers, with reallocate=true/false but no more success.

It seems the whole file content is appended wto the BufferOutputStream, and then the underlying buffer is written.

At least this seems to be the case with AHC integration:
https://github.com/AsyncHttpClient/async-http-client/blob/6faf1f316e5546110b0779a5a42fd9d03ba6bc15/providers/grizzly/src/main/java/org/asynchttpclient/providers/grizzly/bodyhandler/PartsBodyHandler.java


So, is there a way to patch AHC to stream the file so that I could eventually consume only 20mo of heap while uploading a 500mo file?
Or is this simply impossible with Grizzly?
I didn't notice anything related to that in the documentation.
It's possible with the FeedableBodyGenerator.  But if you're tied to using Multipart uploads,  you'd have to convert the multipart data to Buffers manually and send using the FeedableBodyGenerator.
I'll take a closer look to see if this area can be improved.


Btw in my case it is a file upload. I receive a file with CXF and have to transmit it to a storage server (like S3). CXF doesn't consume memory bevause it is streaming the large fle uploads to the file system, and then provides an input stream on that file.

Thanks









Reply | Threaded
Open this post in threaded view
|

Re: Upload a large file without oom with Grizzly

Ryan Lubke-2
In reply to this post by Ryan Lubke-2
Please disregard.

Ryan Lubke wrote:
Sébastien,

Could you also provide a sample of how you're performing your feed? 

Thanks,
-rl

Ryan Lubke wrote:
Sébastien,

I'd recommend looking at Connection.canWrite() [1] and Connection.notifyCanWrite(WriteListener) [1]

By default, Grizzly will configure the async write queue length to be four times the write buffer size (which is based off the socket write buffer).
When this queue exceeds this value, canWrite() will return false. 

When this occurs, you can register a WriteListener to be notified when the queue length is below the configured max and then simulate blocking
until the onWritePossible() callback has been invoked.

----------------------------------------------------------------

        final FutureImpl<Boolean> future = Futures.createSafeFuture();
       
        // Connection may be obtained by calling FilterChainContext.getConnection().
        connection.notifyCanWrite(new WriteHandler() {

            @Override
            public void onWritePossible() throws Exception {
                future.result(Boolean.TRUE);
            }

            @Override
            public void onError(Throwable t) {
                future.failure(Exceptions.makeIOException(t));
            }
        });
       
        try {
            final long writeTimeout = 30;
            future.get(writeTimeout, TimeUnit.MILLISECONDS);
        } catch (ExecutionException e) {
            HttpTransactionContext httpCtx = HttpTransactionContext.get(connection);
            httpCtx.abort(e.getCause());
        } catch (Exception e) {
           
HttpTransactionContext httpCtx = HttpTransactionContext.get(connection);
            httpCtx.abort(e);

        }

---------------------------------------------------------------------

[1] http://grizzly.java.net/docs/2.3/apidocs/org/glassfish/grizzly/OutputSink.html.

Sébastien  Lorber wrote:
Ryan, I've did some other tests.


It seems that using a blocking queue in the FeedableBodyGenerator is totally useless because the thread consuming it is not blocking and the queue never blocks the feeding, which was my intention in the beginning. Maybe it depends on the IO strategy used? 
I use AHC default which seems to use SameThreadIOStrategy so I don't think it's related to the IO strategy.


So in the end I can upload a 70m file with a heap of 50m, but I have to put a Thread.sleep(30) between each 100k Buffer send to the FeedableBodyGenerator

The connection with the server is not good here, but in production it is normally a lot better as far as I know.



I've tried things like clientTransport.getAsyncQueueIO().getWriter().setMaxPendingBytesPerConnection(100000); but it doesn't seem to work for me.

I'd like the Grizzly internals to block when there are too much pending bytes to send. Is it possible?




PS: I've just been able to send a 500mo file with 100mo heap, but it needed a sleep of 100ms between each 100k Buffer sent to the bodygenerator




2013/8/29 Sébastien Lorber <[hidden email]>


By chance do you if I can remove the MessageCloner used in the SSL filter?
SSLBaseFilter$OnWriteCopyCloner

It seems to allocate a lot of memory.
I don't really understand why messages have to be cloned, can I remove this? How? 


2013/8/29 Sébastien Lorber <[hidden email]>

I'm trying to send a 500m file for my tests with a heap of 400m.

In our real use cases we would probably have files under 20mo but we want to reduce the memory consumption because we can have x parallel uploads on the same server according to the user activity.

I'll try to check if using this BodyGenerator reduced the memory footprint or if it's almost like before.


2013/8/28 Ryan Lubke <[hidden email]>
At this point in time, as far as the SSL buffer allocation is concerned, it's untunable.

That said, feel free to open a feature request.

As to your second question, there is no suggested size.  This is all very application specific.

I'm curious, how large of a file are you sending?



Sébastien Lorber wrote:
I have seen a lot of buffers which have a size of 33842 and it seems the limit is near half the capacity.

Perhaps there's a way to tune that buffer size so that it consumes less memory?
Is there an ideal Buffer size to send to the feed method?


2013/8/28 Ryan Lubke <[hidden email]>
I'll be reviewing the PR today, thanks again!

Regarding the OOM: as it stands now, for each new buffer that is passed to the SSLFilter, we allocate a buffer twice the size in order to
accommodate the encrypted result.  So there's an increase.

Depending on the socket configurations of both endpoints, and how fast the remote is reading data, it could
be the write queue is becoming too large.   We do have a way to detect this situation, but I'm pretty sure
the Grizzly internals are currently shielded here.  I will see what I can do to allow users to leverage this.




Sébastien Lorber wrote:
Hello,

I've made my pull request.

With my usecase it works, the file is uploaded like before.



But I didn't notice a big memory improvement.

Is it possible that SSL doesn't allow to stream the body or something like that?



In memory, I have a lot of:
- HeapByteBuffer
Which are hold by SSLUtils$3
Which are hold by BufferBuffers
Which are hold by WriteResult
Which are hold by AsyncWriteQueueRecord


Here is an exemple of the OOM stacktrace:

java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
at org.glassfish.grizzly.ssl.SSLUtils.allocateOutputBuffer(SSLUtils.java:342)
at org.glassfish.grizzly.ssl.SSLBaseFilter$2.grow(SSLBaseFilter.java:117)
at org.glassfish.grizzly.ssl.SSLConnectionContext.ensureBufferSize(SSLConnectionContext.java:392)
at org.glassfish.grizzly.ssl.SSLConnectionContext.wrap(SSLConnectionContext.java:272)
at org.glassfish.grizzly.ssl.SSLConnectionContext.wrapAll(SSLConnectionContext.java:227)
at org.glassfish.grizzly.ssl.SSLBaseFilter.wrapAll(SSLBaseFilter.java:404)
at org.glassfish.grizzly.ssl.SSLBaseFilter.handleWrite(SSLBaseFilter.java:319)
at org.glassfish.grizzly.ssl.SSLFilter.accurateWrite(SSLFilter.java:255)
at org.glassfish.grizzly.ssl.SSLFilter.handleWrite(SSLFilter.java:143)
at com.ning.http.client.providers.grizzly.GrizzlyAsyncHttpProvider$SwitchingSSLFilter.handleWrite(GrizzlyAsyncHttpProvider.java:2503)
at org.glassfish.grizzly.filterchain.ExecutorResolver$8.execute(ExecutorResolver.java:111)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:288)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:206)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:136)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:114)
at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:77)
at org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:853)
at org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:720)
at com.ning.http.client.providers.grizzly.FeedableBodyGenerator.flushQueue(FeedableBodyGenerator.java:132)
at com.ning.http.client.providers.grizzly.FeedableBodyGenerator.feed(FeedableBodyGenerator.java:101)
at com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder$FeedBodyGeneratorOutputStream.write(MultipartBodyGeneratorFeeder.java:222)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at com.ning.http.multipart.FilePart.sendData(FilePart.java:179)
at com.ning.http.multipart.Part.send(Part.java:331)
at com.ning.http.multipart.Part.sendParts(Part.java:397)
at com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder.feed(MultipartBodyGeneratorFeeder.java:144)




Any idea?



2013/8/27 Ryan Lubke <[hidden email]>
Excellent!  Looking forward to the pull request!


Sébastien Lorber wrote:
Ryan thanks, it works fine, I'll make a pull request on AHC tomorrow with a better code using the same Part classes that already exist.

I created an OutputStream that redirects to the BodyGenerator feeder.

The problem I currently have is that the feeder feeds the queue faster than the async thread polling it :)
I need to expose a limit to that queue size or something, will work on that, it will be better than a thread sleep to slow down the filepart reading


2013/8/27 Ryan Lubke <[hidden email]>
Yes, something like that.  I was going to tackle adding something like this today.  I'll follow up with something you can test out.


Sébastien Lorber wrote:
Ok thanks!

I think I see what I could do, probably something like that:


    FeedableBodyGenerator bodyGenerator = new FeedableBodyGenerator();
    MultipartBodyGeneratorFeeder bodyGeneratorFeeder = new MultipartBodyGeneratorFeeder(bodyGenerator);
    Request uploadRequest1 = new RequestBuilder("POST")
            .setUrl("url")
            .setBody(bodyGenerator)
            .build();

    ListenableFuture<Response> asyncRes = asyncHttpClient
        .prepareRequest(uploadRequest1)
        .execute(new AsyncCompletionHandlerBase());


    bodyGeneratorFeeder.append("param1","value1");
    bodyGeneratorFeeder.append("param2","value2");
    bodyGeneratorFeeder.append("fileToUpload",fileInputStream);
    bodyGeneratorFeeder.end();

    Response uploadResponse = asyncRes.get();


Does it seem ok to you?

I guess it could be interesting to provide that MultipartBodyGeneratorFeeder class to AHC or Grizzly since some other people may want to achieve the same thing





2013/8/26 Ryan Lubke <[hidden email]>


Sébastien Lorber wrote:
Hello,

I would like to know if it's possible to upload a file with AHC / Grizzly in streaming, I mean without loading the whole file bytes in memory.

The default behavior seems to allocate a byte[] which contans the whole file, so it means that my server can be OOM if too many users upload a large file in the same time.


I've tryied with a Heap and ByteBuffer memory managers, with reallocate=true/false but no more success.

It seems the whole file content is appended wto the BufferOutputStream, and then the underlying buffer is written.

At least this seems to be the case with AHC integration:
https://github.com/AsyncHttpClient/async-http-client/blob/6faf1f316e5546110b0779a5a42fd9d03ba6bc15/providers/grizzly/src/main/java/org/asynchttpclient/providers/grizzly/bodyhandler/PartsBodyHandler.java


So, is there a way to patch AHC to stream the file so that I could eventually consume only 20mo of heap while uploading a 500mo file?
Or is this simply impossible with Grizzly?
I didn't notice anything related to that in the documentation.
It's possible with the FeedableBodyGenerator.  But if you're tied to using Multipart uploads,  you'd have to convert the multipart data to Buffers manually and send using the FeedableBodyGenerator.
I'll take a closer look to see if this area can be improved.


Btw in my case it is a file upload. I receive a file with CXF and have to transmit it to a storage server (like S3). CXF doesn't consume memory bevause it is streaming the large fle uploads to the file system, and then provides an input stream on that file.

Thanks









Reply | Threaded
Open this post in threaded view
|

Re: Upload a large file without oom with Grizzly

Sébastien Lorber
Thank you, it works fine!


I just had to modify a single line after your commit.

com.ning.http.client.providers.grizzly.GrizzlyAsyncHttpProvider#initializeTransport
-> clientTransport.getAsyncQueueIO().getWriter().setMaxPendingBytesPerConnection(10000);


If I let the initial value (-1) it won't block, canWrite always returns true


Btw, on AHC I didn't find any way to pass this value as a config attribute, neither the size of the write buffer you talked about.

So in the end, is there a way with current AHC code to use this "canWrite = false" behavior? 
If not, can you please provide a way to set this behavior on v1.7.20 ? thanks


PS: does it make sens to use the same number of bytes un the feed(Buffer) method and in the setMaxPendingBytesPerConnection(10000); ? do you have any tuning recommandation?



2013/8/29 Ryan Lubke <[hidden email]>
Please disregard.


Ryan Lubke wrote:
Sébastien,

Could you also provide a sample of how you're performing your feed? 

Thanks,
-rl

Ryan Lubke wrote:
Sébastien,

I'd recommend looking at Connection.canWrite() [1] and Connection.notifyCanWrite(WriteListener) [1]

By default, Grizzly will configure the async write queue length to be four times the write buffer size (which is based off the socket write buffer).
When this queue exceeds this value, canWrite() will return false. 

When this occurs, you can register a WriteListener to be notified when the queue length is below the configured max and then simulate blocking
until the onWritePossible() callback has been invoked.

----------------------------------------------------------------

        final FutureImpl<Boolean> future = Futures.createSafeFuture();
       
        // Connection may be obtained by calling FilterChainContext.getConnection().
        connection.notifyCanWrite(new WriteHandler() {

            @Override
            public void onWritePossible() throws Exception {
                future.result(Boolean.TRUE);
            }

            @Override
            public void onError(Throwable t) {
                future.failure(Exceptions.makeIOException(t));
            }
        });
       
        try {
            final long writeTimeout = 30;
            future.get(writeTimeout, TimeUnit.MILLISECONDS);
        } catch (ExecutionException e) {
            HttpTransactionContext httpCtx = HttpTransactionContext.get(connection);
            httpCtx.abort(e.getCause());
        } catch (Exception e) {
           
HttpTransactionContext httpCtx = HttpTransactionContext.get(connection);
            httpCtx.abort(e);

        }

---------------------------------------------------------------------

[1] http://grizzly.java.net/docs/2.3/apidocs/org/glassfish/grizzly/OutputSink.html.

Sébastien  Lorber wrote:
Ryan, I've did some other tests.


It seems that using a blocking queue in the FeedableBodyGenerator is totally useless because the thread consuming it is not blocking and the queue never blocks the feeding, which was my intention in the beginning. Maybe it depends on the IO strategy used? 
I use AHC default which seems to use SameThreadIOStrategy so I don't think it's related to the IO strategy.


So in the end I can upload a 70m file with a heap of 50m, but I have to put a Thread.sleep(30) between each 100k Buffer send to the FeedableBodyGenerator

The connection with the server is not good here, but in production it is normally a lot better as far as I know.



I've tried things like clientTransport.getAsyncQueueIO().getWriter().setMaxPendingBytesPerConnection(100000); but it doesn't seem to work for me.

I'd like the Grizzly internals to block when there are too much pending bytes to send. Is it possible?




PS: I've just been able to send a 500mo file with 100mo heap, but it needed a sleep of 100ms between each 100k Buffer sent to the bodygenerator




2013/8/29 Sébastien Lorber <[hidden email]>


By chance do you if I can remove the MessageCloner used in the SSL filter?
SSLBaseFilter$OnWriteCopyCloner

It seems to allocate a lot of memory.
I don't really understand why messages have to be cloned, can I remove this? How? 


2013/8/29 Sébastien Lorber <[hidden email]>

I'm trying to send a 500m file for my tests with a heap of 400m.

In our real use cases we would probably have files under 20mo but we want to reduce the memory consumption because we can have x parallel uploads on the same server according to the user activity.

I'll try to check if using this BodyGenerator reduced the memory footprint or if it's almost like before.


2013/8/28 Ryan Lubke <[hidden email]>
At this point in time, as far as the SSL buffer allocation is concerned, it's untunable.

That said, feel free to open a feature request.

As to your second question, there is no suggested size.  This is all very application specific.

I'm curious, how large of a file are you sending?



Sébastien Lorber wrote:
I have seen a lot of buffers which have a size of 33842 and it seems the limit is near half the capacity.

Perhaps there's a way to tune that buffer size so that it consumes less memory?
Is there an ideal Buffer size to send to the feed method?


2013/8/28 Ryan Lubke <[hidden email]>
I'll be reviewing the PR today, thanks again!

Regarding the OOM: as it stands now, for each new buffer that is passed to the SSLFilter, we allocate a buffer twice the size in order to
accommodate the encrypted result.  So there's an increase.

Depending on the socket configurations of both endpoints, and how fast the remote is reading data, it could
be the write queue is becoming too large.   We do have a way to detect this situation, but I'm pretty sure
the Grizzly internals are currently shielded here.  I will see what I can do to allow users to leverage this.




Sébastien Lorber wrote:
Hello,

I've made my pull request.

With my usecase it works, the file is uploaded like before.



But I didn't notice a big memory improvement.

Is it possible that SSL doesn't allow to stream the body or something like that?



In memory, I have a lot of:
- HeapByteBuffer
Which are hold by SSLUtils$3
Which are hold by BufferBuffers
Which are hold by WriteResult
Which are hold by AsyncWriteQueueRecord


Here is an exemple of the OOM stacktrace:

java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
at org.glassfish.grizzly.ssl.SSLUtils.allocateOutputBuffer(SSLUtils.java:342)
at org.glassfish.grizzly.ssl.SSLBaseFilter$2.grow(SSLBaseFilter.java:117)
at org.glassfish.grizzly.ssl.SSLConnectionContext.ensureBufferSize(SSLConnectionContext.java:392)
at org.glassfish.grizzly.ssl.SSLConnectionContext.wrap(SSLConnectionContext.java:272)
at org.glassfish.grizzly.ssl.SSLConnectionContext.wrapAll(SSLConnectionContext.java:227)
at org.glassfish.grizzly.ssl.SSLBaseFilter.wrapAll(SSLBaseFilter.java:404)
at org.glassfish.grizzly.ssl.SSLBaseFilter.handleWrite(SSLBaseFilter.java:319)
at org.glassfish.grizzly.ssl.SSLFilter.accurateWrite(SSLFilter.java:255)
at org.glassfish.grizzly.ssl.SSLFilter.handleWrite(SSLFilter.java:143)
at com.ning.http.client.providers.grizzly.GrizzlyAsyncHttpProvider$SwitchingSSLFilter.handleWrite(GrizzlyAsyncHttpProvider.java:2503)
at org.glassfish.grizzly.filterchain.ExecutorResolver$8.execute(ExecutorResolver.java:111)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:288)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:206)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:136)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:114)
at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:77)
at org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:853)
at org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:720)
at com.ning.http.client.providers.grizzly.FeedableBodyGenerator.flushQueue(FeedableBodyGenerator.java:132)
at com.ning.http.client.providers.grizzly.FeedableBodyGenerator.feed(FeedableBodyGenerator.java:101)
at com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder$FeedBodyGeneratorOutputStream.write(MultipartBodyGeneratorFeeder.java:222)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at com.ning.http.multipart.FilePart.sendData(FilePart.java:179)
at com.ning.http.multipart.Part.send(Part.java:331)
at com.ning.http.multipart.Part.sendParts(Part.java:397)
at com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder.feed(MultipartBodyGeneratorFeeder.java:144)




Any idea?



2013/8/27 Ryan Lubke <[hidden email]>
Excellent!  Looking forward to the pull request!


Sébastien Lorber wrote:
Ryan thanks, it works fine, I'll make a pull request on AHC tomorrow with a better code using the same Part classes that already exist.

I created an OutputStream that redirects to the BodyGenerator feeder.

The problem I currently have is that the feeder feeds the queue faster than the async thread polling it :)
I need to expose a limit to that queue size or something, will work on that, it will be better than a thread sleep to slow down the filepart reading


2013/8/27 Ryan Lubke <[hidden email]>
Yes, something like that.  I was going to tackle adding something like this today.  I'll follow up with something you can test out.


Sébastien Lorber wrote:
Ok thanks!

I think I see what I could do, probably something like that:


    FeedableBodyGenerator bodyGenerator = new FeedableBodyGenerator();
    MultipartBodyGeneratorFeeder bodyGeneratorFeeder = new MultipartBodyGeneratorFeeder(bodyGenerator);
    Request uploadRequest1 = new RequestBuilder("POST")
            .setUrl("url")
            .setBody(bodyGenerator)
            .build();

    ListenableFuture<Response> asyncRes = asyncHttpClient
        .prepareRequest(uploadRequest1)
        .execute(new AsyncCompletionHandlerBase());


    bodyGeneratorFeeder.append("param1","value1");
    bodyGeneratorFeeder.append("param2","value2");
    bodyGeneratorFeeder.append("fileToUpload",fileInputStream);
    bodyGeneratorFeeder.end();

    Response uploadResponse = asyncRes.get();


Does it seem ok to you?

I guess it could be interesting to provide that MultipartBodyGeneratorFeeder class to AHC or Grizzly since some other people may want to achieve the same thing





2013/8/26 Ryan Lubke <[hidden email]>


Sébastien Lorber wrote:
Hello,

I would like to know if it's possible to upload a file with AHC / Grizzly in streaming, I mean without loading the whole file bytes in memory.

The default behavior seems to allocate a byte[] which contans the whole file, so it means that my server can be OOM if too many users upload a large file in the same time.


I've tryied with a Heap and ByteBuffer memory managers, with reallocate=true/false but no more success.

It seems the whole file content is appended wto the BufferOutputStream, and then the underlying buffer is written.

At least this seems to be the case with AHC integration:
https://github.com/AsyncHttpClient/async-http-client/blob/6faf1f316e5546110b0779a5a42fd9d03ba6bc15/providers/grizzly/src/main/java/org/asynchttpclient/providers/grizzly/bodyhandler/PartsBodyHandler.java


So, is there a way to patch AHC to stream the file so that I could eventually consume only 20mo of heap while uploading a 500mo file?
Or is this simply impossible with Grizzly?
I didn't notice anything related to that in the documentation.
It's possible with the FeedableBodyGenerator.  But if you're tied to using Multipart uploads,  you'd have to convert the multipart data to Buffers manually and send using the FeedableBodyGenerator.
I'll take a closer look to see if this area can be improved.


Btw in my case it is a file upload. I receive a file with CXF and have to transmit it to a storage server (like S3). CXF doesn't consume memory bevause it is streaming the large fle uploads to the file system, and then provides an input stream on that file.

Thanks










Reply | Threaded
Open this post in threaded view
|

Re: Upload a large file without oom with Grizzly

Sébastien Lorber
By the way, do you have any idea when the 1.7.20 will be released (with these new improvements?)

We would like to know if we wait for a release or if we install our own temp release on Nexus :)


2013/8/30 Sébastien Lorber <[hidden email]>
Thank you, it works fine!


I just had to modify a single line after your commit.

com.ning.http.client.providers.grizzly.GrizzlyAsyncHttpProvider#initializeTransport
-> clientTransport.getAsyncQueueIO().getWriter().setMaxPendingBytesPerConnection(10000);


If I let the initial value (-1) it won't block, canWrite always returns true


Btw, on AHC I didn't find any way to pass this value as a config attribute, neither the size of the write buffer you talked about.

So in the end, is there a way with current AHC code to use this "canWrite = false" behavior? 
If not, can you please provide a way to set this behavior on v1.7.20 ? thanks


PS: does it make sens to use the same number of bytes un the feed(Buffer) method and in the setMaxPendingBytesPerConnection(10000); ? do you have any tuning recommandation?



2013/8/29 Ryan Lubke <[hidden email]>
Please disregard.


Ryan Lubke wrote:
Sébastien,

Could you also provide a sample of how you're performing your feed? 

Thanks,
-rl

Ryan Lubke wrote:
Sébastien,

I'd recommend looking at Connection.canWrite() [1] and Connection.notifyCanWrite(WriteListener) [1]

By default, Grizzly will configure the async write queue length to be four times the write buffer size (which is based off the socket write buffer).
When this queue exceeds this value, canWrite() will return false. 

When this occurs, you can register a WriteListener to be notified when the queue length is below the configured max and then simulate blocking
until the onWritePossible() callback has been invoked.

----------------------------------------------------------------

        final FutureImpl<Boolean> future = Futures.createSafeFuture();
       
        // Connection may be obtained by calling FilterChainContext.getConnection().
        connection.notifyCanWrite(new WriteHandler() {

            @Override
            public void onWritePossible() throws Exception {
                future.result(Boolean.TRUE);
            }

            @Override
            public void onError(Throwable t) {
                future.failure(Exceptions.makeIOException(t));
            }
        });
       
        try {
            final long writeTimeout = 30;
            future.get(writeTimeout, TimeUnit.MILLISECONDS);
        } catch (ExecutionException e) {
            HttpTransactionContext httpCtx = HttpTransactionContext.get(connection);
            httpCtx.abort(e.getCause());
        } catch (Exception e) {
           
HttpTransactionContext httpCtx = HttpTransactionContext.get(connection);
            httpCtx.abort(e);

        }

---------------------------------------------------------------------

[1] http://grizzly.java.net/docs/2.3/apidocs/org/glassfish/grizzly/OutputSink.html.

Sébastien  Lorber wrote:
Ryan, I've did some other tests.


It seems that using a blocking queue in the FeedableBodyGenerator is totally useless because the thread consuming it is not blocking and the queue never blocks the feeding, which was my intention in the beginning. Maybe it depends on the IO strategy used? 
I use AHC default which seems to use SameThreadIOStrategy so I don't think it's related to the IO strategy.


So in the end I can upload a 70m file with a heap of 50m, but I have to put a Thread.sleep(30) between each 100k Buffer send to the FeedableBodyGenerator

The connection with the server is not good here, but in production it is normally a lot better as far as I know.



I've tried things like clientTransport.getAsyncQueueIO().getWriter().setMaxPendingBytesPerConnection(100000); but it doesn't seem to work for me.

I'd like the Grizzly internals to block when there are too much pending bytes to send. Is it possible?




PS: I've just been able to send a 500mo file with 100mo heap, but it needed a sleep of 100ms between each 100k Buffer sent to the bodygenerator




2013/8/29 Sébastien Lorber <[hidden email]>


By chance do you if I can remove the MessageCloner used in the SSL filter?
SSLBaseFilter$OnWriteCopyCloner

It seems to allocate a lot of memory.
I don't really understand why messages have to be cloned, can I remove this? How? 


2013/8/29 Sébastien Lorber <[hidden email]>

I'm trying to send a 500m file for my tests with a heap of 400m.

In our real use cases we would probably have files under 20mo but we want to reduce the memory consumption because we can have x parallel uploads on the same server according to the user activity.

I'll try to check if using this BodyGenerator reduced the memory footprint or if it's almost like before.


2013/8/28 Ryan Lubke <[hidden email]>
At this point in time, as far as the SSL buffer allocation is concerned, it's untunable.

That said, feel free to open a feature request.

As to your second question, there is no suggested size.  This is all very application specific.

I'm curious, how large of a file are you sending?



Sébastien Lorber wrote:
I have seen a lot of buffers which have a size of 33842 and it seems the limit is near half the capacity.

Perhaps there's a way to tune that buffer size so that it consumes less memory?
Is there an ideal Buffer size to send to the feed method?


2013/8/28 Ryan Lubke <[hidden email]>
I'll be reviewing the PR today, thanks again!

Regarding the OOM: as it stands now, for each new buffer that is passed to the SSLFilter, we allocate a buffer twice the size in order to
accommodate the encrypted result.  So there's an increase.

Depending on the socket configurations of both endpoints, and how fast the remote is reading data, it could
be the write queue is becoming too large.   We do have a way to detect this situation, but I'm pretty sure
the Grizzly internals are currently shielded here.  I will see what I can do to allow users to leverage this.




Sébastien Lorber wrote:
Hello,

I've made my pull request.

With my usecase it works, the file is uploaded like before.



But I didn't notice a big memory improvement.

Is it possible that SSL doesn't allow to stream the body or something like that?



In memory, I have a lot of:
- HeapByteBuffer
Which are hold by SSLUtils$3
Which are hold by BufferBuffers
Which are hold by WriteResult
Which are hold by AsyncWriteQueueRecord


Here is an exemple of the OOM stacktrace:

java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
at org.glassfish.grizzly.ssl.SSLUtils.allocateOutputBuffer(SSLUtils.java:342)
at org.glassfish.grizzly.ssl.SSLBaseFilter$2.grow(SSLBaseFilter.java:117)
at org.glassfish.grizzly.ssl.SSLConnectionContext.ensureBufferSize(SSLConnectionContext.java:392)
at org.glassfish.grizzly.ssl.SSLConnectionContext.wrap(SSLConnectionContext.java:272)
at org.glassfish.grizzly.ssl.SSLConnectionContext.wrapAll(SSLConnectionContext.java:227)
at org.glassfish.grizzly.ssl.SSLBaseFilter.wrapAll(SSLBaseFilter.java:404)
at org.glassfish.grizzly.ssl.SSLBaseFilter.handleWrite(SSLBaseFilter.java:319)
at org.glassfish.grizzly.ssl.SSLFilter.accurateWrite(SSLFilter.java:255)
at org.glassfish.grizzly.ssl.SSLFilter.handleWrite(SSLFilter.java:143)
at com.ning.http.client.providers.grizzly.GrizzlyAsyncHttpProvider$SwitchingSSLFilter.handleWrite(GrizzlyAsyncHttpProvider.java:2503)
at org.glassfish.grizzly.filterchain.ExecutorResolver$8.execute(ExecutorResolver.java:111)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:288)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:206)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:136)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:114)
at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:77)
at org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:853)
at org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:720)
at com.ning.http.client.providers.grizzly.FeedableBodyGenerator.flushQueue(FeedableBodyGenerator.java:132)
at com.ning.http.client.providers.grizzly.FeedableBodyGenerator.feed(FeedableBodyGenerator.java:101)
at com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder$FeedBodyGeneratorOutputStream.write(MultipartBodyGeneratorFeeder.java:222)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at com.ning.http.multipart.FilePart.sendData(FilePart.java:179)
at com.ning.http.multipart.Part.send(Part.java:331)
at com.ning.http.multipart.Part.sendParts(Part.java:397)
at com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder.feed(MultipartBodyGeneratorFeeder.java:144)




Any idea?



2013/8/27 Ryan Lubke <[hidden email]>
Excellent!  Looking forward to the pull request!


Sébastien Lorber wrote:
Ryan thanks, it works fine, I'll make a pull request on AHC tomorrow with a better code using the same Part classes that already exist.

I created an OutputStream that redirects to the BodyGenerator feeder.

The problem I currently have is that the feeder feeds the queue faster than the async thread polling it :)
I need to expose a limit to that queue size or something, will work on that, it will be better than a thread sleep to slow down the filepart reading


2013/8/27 Ryan Lubke <[hidden email]>
Yes, something like that.  I was going to tackle adding something like this today.  I'll follow up with something you can test out.


Sébastien Lorber wrote:
Ok thanks!

I think I see what I could do, probably something like that:


    FeedableBodyGenerator bodyGenerator = new FeedableBodyGenerator();
    MultipartBodyGeneratorFeeder bodyGeneratorFeeder = new MultipartBodyGeneratorFeeder(bodyGenerator);
    Request uploadRequest1 = new RequestBuilder("POST")
            .setUrl("url")
            .setBody(bodyGenerator)
            .build();

    ListenableFuture<Response> asyncRes = asyncHttpClient
        .prepareRequest(uploadRequest1)
        .execute(new AsyncCompletionHandlerBase());


    bodyGeneratorFeeder.append("param1","value1");
    bodyGeneratorFeeder.append("param2","value2");
    bodyGeneratorFeeder.append("fileToUpload",fileInputStream);
    bodyGeneratorFeeder.end();

    Response uploadResponse = asyncRes.get();


Does it seem ok to you?

I guess it could be interesting to provide that MultipartBodyGeneratorFeeder class to AHC or Grizzly since some other people may want to achieve the same thing





2013/8/26 Ryan Lubke <[hidden email]>


Sébastien Lorber wrote:
Hello,

I would like to know if it's possible to upload a file with AHC / Grizzly in streaming, I mean without loading the whole file bytes in memory.

The default behavior seems to allocate a byte[] which contans the whole file, so it means that my server can be OOM if too many users upload a large file in the same time.


I've tryied with a Heap and ByteBuffer memory managers, with reallocate=true/false but no more success.

It seems the whole file content is appended wto the BufferOutputStream, and then the underlying buffer is written.

At least this seems to be the case with AHC integration:
https://github.com/AsyncHttpClient/async-http-client/blob/6faf1f316e5546110b0779a5a42fd9d03ba6bc15/providers/grizzly/src/main/java/org/asynchttpclient/providers/grizzly/bodyhandler/PartsBodyHandler.java


So, is there a way to patch AHC to stream the file so that I could eventually consume only 20mo of heap while uploading a 500mo file?
Or is this simply impossible with Grizzly?
I didn't notice anything related to that in the documentation.
It's possible with the FeedableBodyGenerator.  But if you're tied to using Multipart uploads,  you'd have to convert the multipart data to Buffers manually and send using the FeedableBodyGenerator.
I'll take a closer look to see if this area can be improved.


Btw in my case it is a file upload. I receive a file with CXF and have to transmit it to a storage server (like S3). CXF doesn't consume memory bevause it is streaming the large fle uploads to the file system, and then provides an input stream on that file.

Thanks











1234