Self Documenting Code: 2011

Tuesday, December 13, 2011

Google Apps CNAME activation - here be more dragons

It used to be you could create a CNAME record to verify your ownership of a domain name, when setting up a Google Apps account.

To this day, current Google Apps documentation still says you can do it :

Troubleshooting CNAME Records :

There are two reasons to create a CNAME record for Google Apps: verifying domain ownership and customizing a URL.

Dig as you might through the user interface, you'll be offered a way to verify using TXT records, but nada for CNAME records.

Some enterprising folk around 18 months ago figured a workaround. But even that no longer works.

It seems that Google has quietly - nay silently - axed support for CNAMEs for domain verification.

Whilst still mentioning in current documentation that you can.

So the nontrivial effort I've just gone through getting a CNAME record created by an extraordinarily-slow-to-respond IT department, was a waste.

But not just any kind of waste. A waste for which Google itself is responsible.

It's one thing to remove a feature. Sure - I have no problem with that.

After all, it's a free service anyway, so no complaints there.

But at least tell us.

The current documentation which says you can use a CNAME for domain verification, combined with the preponderence of articles throughout the web explaining how to do it, combined with the complete absence of acknowledgement from Google that they've removed the option, combined with (in the first few pages of search results) complete absence of any bloggers like me (that I managed to find) pointing out the omission, was a perfect recipe for getting me - and doubtless many others - to waste time arranging things that Google no longer supports.

Just tell us.

Y'know, a Google blog post or something with all the right keywords : "by the way, kids, we're axing the CNAME option for Google Apps domain name verification".

By not telling us, and especially by failing to properly update your documentation, you've wasted my time, and doubtless that of others.

Not happy.

Not impressed.

What is wrong with a little footnote in your other help pages (the ones that do now mention TXT records), that CNAME records used to be supported but no longer are?

I mean, if I'd come across one official page (as I did - linked above) on the Google site saying "CNAMEs are supported" and another one saying "CNAMEs used to be supported but no longer are", I'd figure the second supersedes the first. But what's with this insistence on not even mentioning the removal of the CNAME option?

It makes no sense to me, and it is an exercise in providing a bad customer experience. I'm generally a raving Google fan, but this is certainly one of those big speed bumps you really don't want to be giving even your fans.

Sure, it wouldn't hurt anywhere near so much if there wasn't an unresponsive IT department in the picture, making all DNS changes take hours of wasted time and days of elapsed time. But at least if Google's information had been more open and accurate about the change of features, I could have ensured we weren't now in for a second round of delay.

The lesson? When writing documentation, never assume you've correctly updated every place that needs updating. (Hopefully you have updated everything, but don't assume it.) Include a note as to the change, so that a) people used to the old system can be less confused; and b) if any bits of doco slipped through the cracks, people will be able to figure out (with the aid of the change note) what information is current vs outdated.

Thus saith the blogger.

Wednesday, November 2, 2011

Best open-source .NET text file differencing library

After a fairly thorough search for open-source .NET libraries for text file (e.g. source code) differencing, I've concluded that there are only two serious contenders :

google-diff-match-patch; and
DiffPlex

Both look like they would meet my needs.

DiffPlex is C# only, whereas google-diff-match-patch contains equivalent implementations in Java, Javascript, C++, Objective-C, and more, so if you like the idea of learning an API once and using it e.g. in iOS projects or in web browsers (Javascript) directly, google-diff-match-patch is for you.

The DiffPlex API seems a little nicer if all you want is a simple diff.

google-diff-match-patch supports - as its name suggests - producing and processing patch files - so if you need the extra features, google-diff-match-patch wins again.

DiffPlex contains what appears to be a very nice & simple API to drive diff viewers. But the google-diff-match-patch does have some similar thing, even if the API is not as nice.

Both support a line-by-line mode.

google-diff-match-patch has a nice feature where it can simplify diffs down from "perfect" diffs to more semantically-meaningful diffs. It calls this a "cleanup" operation, and depending on your needs, that could be a deciding feature. My immediate needs are so simple that even cleanup isn't relevant, but if its relevant for you, google-diff-match-patch might be the go (unless I missed a similar feature in DiffPlex, but I'm pretty sure I didn't miss that feature).

In short, it seems both are suitable. DiffPlex has a nicer API for the world of .NET (e.g. C#-style naming conventions used throughout) whilst google-diff-match-patch has more features. For my needs - an open-source, native .NET differencing library - both libraries look very suitable and DiffPlex looks a little easier to learn and use (not that either are hard). But I think in the end I'm going to start with google-diff-match-patch on account of the multiple platforms it supports with a uniform API, and the cleanup facility which whilst not relevant immediately is perfect for something I'm planning to do in the future...

If you know of any other serious contenders, let me know, but I'm only interested in native .NET open-source libraries that can be downloaded and used without modification (so that excludes repurposing code in open-source diff viewers). And I did review a few options on Code Project but nothing there compelled me to believe their performance would be any better than the two projects I shortlisted, whereas I expect that these two shortlisted projects will have much better ongoing support.

Posted largely for my own future reference, but also to help other wandering developers. :o)

Tuesday, May 24, 2011

aspnet_merge unresolved assembly reference not alllowed in ASP.NET 4

Hopefully this'll save someone else some time.

I have a moderately large VB.NET ASP.NET website, originally created in ASP.NET 2 and recently upgraded to ASP.NET 4.

I recently built my own packaging scripts that use aspnet_compiler and aspnet_merge.

I was careful to use the .NET Framework v4 version of aspnet_compiler.

When running aspnet_merge on the precompiled website, I got a very strange error :

Utility to merge precompiled ASP.NET assemblies. Version 3.5.30729.
Copyright (c) Microsoft Corporation 2007. All rights reserved.

aspnet_merge: error occurred: An error occurred when merging assemblies: Unresolved assembly reference not allowed: Microsoft.VisualBasic.

The problem was pretty obvious, but stumped me for probably an hour or so : I was using the wrong version of aspnet_merge.exe.

This page helped me find the correct version. On my computer, that's in :

C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\Bin\NETFX 4.0 Tools\aspnet_merge.exe

HTH :o)

Thursday, April 14, 2011

Bazaar repository bloat - rebase, merge, push, pull

UPDATE : Repository bloat (at least in the merge-then-merge-back scenario) can be solved very easily : "bzr pack --clean-obsolete-packs". The content of this article is interesting in the things it examines, but somewhat outdated by this update. Read at own risk...

Bazaar is awesome. I say that almost every time I talk about Bazaar. I love it.

However, there are some use cases where you can end up with repository bloat, completely unnecessarily.

Repository bloat occurs when Bazaar decides - for whatever reason - to make a new version of an existing revision, and thereby duplicate data that was in the revision.

So for example, you have a 5MB file and you add it to a branch, and then merge the branch with the trunk. Trunk's repository size should increase by roughly 5MB, you say? Well, should, you're right, but depending on how you do it, you can actually end up with a 10MB increase instead.

Repository bloat.

So how do you avoid it?

Well, I've noticed repository bloat in two main situations (although there are likely more). Both situations involve the trunk and branch diverging - so if your workflow is such that the trunk and branch are always sync'ed before divergence happens (i.e. when there are only changes on one side or the other but not both), then repository bloat won't be a problem for you (but rare will be workflows where you can guarantee that!)

Here are the two repository bloat scenarios I've noticed :

1) rebase : your revisions get rewritten to your local repository.

e.g.

md trunk
cd trunk
bzr init
echo Hi>readme.txt
bzr add
bzr commit -m "trunk commit"
cd..
bzr branch --stacked trunk branch
cd branch
(put 5MB file called BigFile.dat in branch folder)
bzr add
bzr commit -m "Added BigFile.dat in branch"

So far so good. And if you rebase at this point, you're fine ('coz nothing will happen).

But if we continue :

cd ../trunk
echo A change in trunk>>readme.txt
bzr commit -m "Another trunk commit"
cd ../branch
bzr rebase

... well, the rebase runs just fine, but if you check the size of the .bzr folder in the branch, it is around 10MB, not 5!

Repository bloat!

How to avoid repository bloat when rebasing?

Well, the conslusion I've come to is : let the repository bloat, and merge or push to trunk, and then follow my instructions on purging stacked branches to remove the bloat. (The bloat in rebase cases is only in the branch, not the trunk. And fortunately, it seems that pushing the bloated repository to the trunk only pushes the new versions of the affected revisions instead of pushing both old and new versions - i.e. the bloat is fortunately not propagated back to the trunk in this case.)

(Not using stacked branches? Sorry, not my use case, so I haven't investigated further and thus can't tell you for sure what will work - although if you get really really desperate you can make a new branch --no-tree and then delete the .bzr folder in your existing branch and replace it with the .bzr folder in the new branch. Again - only do that at a point where trunk and branch are in-sync.)

2) merge to branch then merge to trunk

This is a pretty standard operation if you've been working on your branch for a while and the trunk has changed in the meantime.

You can't pull the trunk changes into the branch. Once the two are out-of-sync, you're forced to use merge or rebase. The rebase scenario is covered above, and results in duplication of data in branch revisions from the point of divergence onwards.

The merge scenario is what we're covering here. Its repository bloat characteristics are more interesting. Whereas rebase results in duplication of data in BRANCH revisions from the point of divergence onwards, merge can result in duplication of data in TRUNK revisions from the point of divergence onwards, assuming that you proceed to merge branch back into trunk. (If you PUSH branch back into trunk, I suspect (but haven't tested) that you'll get away without repository bloat - but then you lose the trunk's unique perspective on the change history - i.e. your log and qlog are thereafter from the branch's perspective instead of from the trunk's perspective.)

e.g.
md trunk
cd trunk
bzr init
echo Hi>test.txt
(add 5MB file into trunk folder)
bzr add
bzr commit -m "Initial commit in trunk"
cd..
bzr branch --stacked trunk branch
cd branch
echo bla>test2.txt
bzr add
bzr commit -m "First commit in branch"
cd ..
cd trunk
(replace 5MB file in trunk folder with a different 5MB file of same name)
bzr commit -m "Modified BigFile.dat"
cd ..
cd branch

OK - so far so good - but trunk and branch have diverged and now we're at the point we want to make them converge. Normally we might do :

bzr merge ../trunk
bzr commit -m "Merged trunk changes into branch"
cd ..
cd trunk
bzr merge ../branch
bzr commit -m "Merged branch into trunk"

... but if you do that, you'll get our lovely friend Repository Bloat(TM)!

Why?

Well, it seems that merging the 5MB file's modification revision in from trunk to branch, which requires a commit, results in that 5MB file's data ending up in a second revision, and when we merge back into trunk, that second revision ends up in the trunk's repository. (Interestingly, does not happen if the file was newly created in the trunk - just if it was already known to the branch and was updated in the trunk.)

10MB repository growth for a 5MB file. Baaaaad.

(To emphasize : the final trunk repository size is 15MB : 5MB after initial commit of the 5MB file, then a further 5MB totalling 10MB after second commit to trunk, and finally a third 5MB totalling 15MB after merging in from branch and committing again.)

We saw how to get around it with the rebase bloat problem. How to get around it with the merge bloat problem?

One way is to avoid the merge-then-merge-back entirely. If trunk has changed and you can't pull the changes into the branch because trunk and branch have diverged, then rebase instead. You might/will end up with branch repository bloat, but I cover how to deal with that in the preceding section on repository bloat caused by the rebase operation.

All a bit tedious? Perhaps. But easily scriptable.

Of course, if your workflow relies on the merge process, you might just have to accept the bloat. Not ideal. You might be able to avoid the bloat by using the merge -c option when merging back into trunk, to "cherry-pick" only the branch revisions that are not themselves merge-from-trunk commits. And there are yet more desperate approaches one could take if needed - e.g. export branch changes to a patch set, delete branch, recreate it from trunk and apply patches!!! Well y'know, it would probably work.......

And maybe I need my head checked, but even with a few little problems like this, I still absolutely love Bazaar. (Yes - relatively little. In practice, does it matter if your repository is twice the size it needs to be? Sometimes yes, usually no. For me, it's a little more critical than for others due to certain peculiar circumstances, and hence my investigations in how to avoid/resolve repository bloat.) Thanks for stopping by! :o)

Purging stacked branches in Bazaar

Stacked branches are awesome!

Shared repositories go so far, but don't work so well if the parent and child branches are far away from each other in the file system (nor if they are on different volumes), and shared repositories have the weakness that if you create a revision, it lives on forever, even if you later delete the branch associated with that revision. (You can't actually get the revision back, not by any way I've found (UPDATE : "bzr heads --all" looks like it lets you find "lost" revisions.), but the shared repository's size never goes down - it just keeps accruing more and more data, never letting any of it go. (UPDATE : I'm no longer entirely sure when the repository's size changes - "bzr pack --clean-obsolete-packs" does wonders))

In contrast, stacked branches can be used at any time both the parent and child branch are simultaneously accessible (even if they're on different hard disks or even one on a URL), and best of all, if you make an experimental branch and decide to kill it, bam! - its history is gone forever and your trunk repository isn't forever bloated by the revisions you decided to nuke.

And they're extremely useful if you want the same library to be in multiple apps (in different Bazaar repositories) and want to be able to edit the source code in each copy of the library independently but have them all closely associated.

And did I mention they save a lot of storage space?

But thence cometh the problem : stacked branches start out tiny, because they aren't carrying the five decades of history that the trunk contains, but after that they grow.

And grow.

What if you just want the stacked branch repositories to stay nice and trim, like they were when you made them?

There doesn't seem to be any built-in feature in Bazaar to do that.

push, pull, merge, do whatever you want - the stacked branch's repository only grows.

So we resort to a little bit of - very effective - skullduggery.

FIRST UP, ENSURE YOU TRY THIS EXPERIMENTALLY FIRST. It worked for me, but might destroy you and your world and your company's beautiful source code and get you fired. THIS USES UNDOCUMENTED TRICKS. So it could stop working when new versions of Bazaar roll out. I have and accept no responsibility for what happens to you if you try this yourself!

1) Purging the stacked branch history obviously needs to be done at times that the stacked branch is in-sync with the trunk. So make sure you've merged or pushed the branch into the trunk.

2) In the branch, delete all files in these two folders :
.bzr\repository\indices
.bzr\repository\packs

3) Still in the branch, locate this file :
.bzr\repository\pack-names
... and change its content to the following five lines :

B+Tree Graph Index 2
node_ref_lists=0
key_elements=1
len=0
row_lengths=

Voila! Do a bzr status or bzr log and the history is all there - its just now coming from the stacked-on branch like you wanted all along. You have successfully purged the stacked branch's history.

Self Documenting Code