On Sat 05 Sep at 09:53:34 -0400
walters(a)verbum.org said:
The thing that concerns me with bsdiff:
1) Backwards compatibility over time; we don't want
changes in the code to cause older clients to be unable
to interpret new versions
I'm less concerned about that. I see it as a product lifecycle management
issue. As I've fiddled with things that were incompatible, I've added
them as a new bsdiff header type, as I see mendsley's done:
- if (fwrite("BSDIFF40", 8, 1, pf) != 1 ||
+ if (fwrite("ENDSLEY/BSDIFF43", 16, 1, pf) != 1 ||
At some point it's good to be able to remove old code. The broader update
mechanism in use needs a way to enforce that old clients update via a
compatibility point. That's not any different for any other part of the
OS. It's easier to do though when you control both the update mechanism
and the bsdiff (or other) code that's having a compatibility break.
For example Android's open source code has the ability to do this between
two-step updates in the recovery console and one can infer through
observation of Nexus phone and tablet behavior that their GOTA admin
console actively chooses what to offer different clients upon check-in.
In ClearLinux, with coordination between devops, update and whoever
in the OS has a known compatibility break, we're practicing deploying
changes in a way that insure the right thing happens going forward,
without stranding old installs, without us having to forever maintain
backwards compatibility code and without having to declare some old
system no longer has an update path to current.
I know there's a philosophical camp that says never break old stuff or
remove symbols. With a security hat on, I want to more aggressively
remove old code.
In our bsdiff code anyway we haven't made that many incompatible changes
and IIRC none in a long time. And our binary deltas are always optional,
with the updater able to fall back to a full file download if a bsdiff
is not available or does not correctly apply and recreate exactly the
desired file. This means we can easily trial new compression methods for
example without stranding older systems or adding server side complexity
to mux out the right version deltas.
Practically speaking a fall back route is required in some form since we
have a rapid release cycle and while we try to bury normal update costs
on the server side, we can't generate the full O(n^2) set of possible
bsdiff's from the full history of releases up to the current one.
2) Are there any inputs for which bsdiff generates
incorrect output?
I don't know if this has happened, but it is a concern.
This one I think can mostly be mitigated by re-applying the
patch at build time and verifying it's identical. The cost of that
would be small for the server. But it ties in with #1 - as
the bsdiff code changes, all historical versions would need to be
checked as well.
I forgot to mention that! In our source tarball please see the
test/bsdiff directory for a simple script and "interesting" file sets.
I've been meaning to make a set of file pairs that lead to the full
set of combinations of compression types and c/d/e blocks. I'd like
to have a git commit hook to run such an A/B test on any commits to
my bsdiff fork. Then I can insure bsdiff's still create correctly.
And that they still apply correctly. But I haven't gotten around to
that over the course of a few years.
The OSTree project maintainers have been contributing
to:
https://github.com/mendsley/bsdiff
I only see one branch, last updated 6mo's ago. But 34 forks. Ooof.
Looking more broadly 68 instances of a bsdiff in C in github. Double Ooof.
It would be quite nice if some effort was made to
merge there;
there's been some very similar changes made. Would you
be open to having it be a standalone git repository at least?
In the short term it's not work for which I have a lot of resources,
but longer term it would be great to unfork. With multiple folks using
some fork of Colin Percival's original, it's better for any of us who
collaborate.
--
Tim Pepper -- Linux OS Engineering
Intel Open Source Technology Center