sapientdust+
2016-08-04 23:40:49 UTC
I've recently run into the issue of clamd not being able to scan files
that are larger than a small number of GB, and I have seen the
warnings in `man clamd.conf` that say specified limits above 4GB are
ignored.
Could developers or other folks familiar with the clamd codebase
comment on the feasibility of scanning large files in multiple pieces
as a way of handling larger files?
For example, given a file that is 6GB, does using multiple INSTREAM
calls (that's how I'm interacting wth clamd currently) to check the
full 6GB seem like it should work reliably?
INSTREAM: bytes 0-1000MB
INSTREAM: bytes 900MB-1.9GB
INSTREAM: bytes 1.8GB-2.8GB
INSTREAM: bytes 2.7GB-3.7GB
INSTREAM: bytes 3.6GB-4.6GB
INSTREAM: bytes 4.5GB-5.5GB
INSTREAM: bytes 5.4GB-6.0GB
There is overlap above, wherein the 100MB of data that starts at the
900MB position is scanned twice, once in the first call (as the last
100MB of that stream) and once in the second call (as the first 100MB
of that stream), to reduce the possibility of a virus being split into
two pieces and therefore not recognized.
If Clamav needs the first bytes in order to know
what kind of file it is scanning and trigger filetype-specific heuristics, then
something like the above could be adapted so that the first N bytes of the
first chunk are prepended to each subsequent chunk that is checked for
that file.
Thanks for any guidance or feedback you can provide.
_______________________________________________
Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq
http://www.clamav.net/contact.html#ml
that are larger than a small number of GB, and I have seen the
warnings in `man clamd.conf` that say specified limits above 4GB are
ignored.
Could developers or other folks familiar with the clamd codebase
comment on the feasibility of scanning large files in multiple pieces
as a way of handling larger files?
For example, given a file that is 6GB, does using multiple INSTREAM
calls (that's how I'm interacting wth clamd currently) to check the
full 6GB seem like it should work reliably?
INSTREAM: bytes 0-1000MB
INSTREAM: bytes 900MB-1.9GB
INSTREAM: bytes 1.8GB-2.8GB
INSTREAM: bytes 2.7GB-3.7GB
INSTREAM: bytes 3.6GB-4.6GB
INSTREAM: bytes 4.5GB-5.5GB
INSTREAM: bytes 5.4GB-6.0GB
There is overlap above, wherein the 100MB of data that starts at the
900MB position is scanned twice, once in the first call (as the last
100MB of that stream) and once in the second call (as the first 100MB
of that stream), to reduce the possibility of a virus being split into
two pieces and therefore not recognized.
If Clamav needs the first bytes in order to know
what kind of file it is scanning and trigger filetype-specific heuristics, then
something like the above could be adapted so that the first N bytes of the
first chunk are prepended to each subsequent chunk that is checked for
that file.
Thanks for any guidance or feedback you can provide.
_______________________________________________
Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq
http://www.clamav.net/contact.html#ml