Archive for September, 2003
Did I mention that programming can be annoying? I think I probably did at some point. Like most programmers [I’m guessing] I often fantasize about careers where no matter how rough a job goes, once the job is done it’s done.
When programming you are often creating a highly complex and automated entity, which needs to keep on doing its job once you are gone. Unfortunately, it is virtually impossible to write 100% bug free code. So sooner or later, something you though you had finished will break, and you will be expected to fix it. *sigh*
At least when it’s your own application that breaks you have a vested interest in fixing it.
Meanwhile, in a highly unlikely parallel universe…
If you are like me then you are probably largely ignorant of the aggregation/syndication side of blogging, but it is an interesting concept: Basically instead of the bad old days where you had a newsreader for reading Usenet groups [and the Usenet groups began to suck because of trolls/spamming etc…] now you get these things called news aggregators which allow you to set up feeds from all your favourite news sites/blogs etc. I’m currently using a free Windows based one called SharpReader, which seems to work fine.
Although it’s not really my role to explain what these things are, since I spent so long wondering why there were all these XML links on people’s pages [with no explanation of what they were for], I thought it might be nice for me to break with tradition and actually point out to the uninitiated what you’re supposed to do with them [it’s like a weird reversal of the old annoying custom that people used to have of putting download links to Netscape and Internet Explorer on their web pages]
How aggregation works:
- Install a reader (eg SharpReader)
- Go to your favourite journal style web sites, and look for the link to the RSS feed, usually named something like rss.xml, and often indicated with an icon like the one above.
- Copy the URL or drag the link into your reader. There is another mechanism where you just click a "subscribe" link but I’m not sure if that’s standard, since none of the sites I visit seem to use it.
Your reader will then periodically scan your various feeds [perhaps hourly] and show you when new articles appear (by changing an icon in the Notification area on the TaskBar). And you hopefully won’t waste quite so much time shuffling around the same 5-10 webpages all day waiting for someone to post something new to read. Your aggregator will show you when something new turns up.
The main problem with aggregation is that if you offer a feed which attracts 1000’s of subscribers, then you will be receiving 1000’s of hits every hour from all their aggregators constantly checking to see if you’ve updated anything, and this may cost bandwidth. Conversely, if you are a user who perversely subscribes to every feed you ever come across, you will also be using tonnes of bandwidth as your aggregator constantly checks for updates. I expect some kind of Google meta-feed system is not far off, where master feeds are generated in real time based on search terms [a bit like GoogleNews ].
DanG very kindly sent me some code for implementing the mysterious OLE IPicture functionality, and lo, I am now able to load jpeg and gif files!
To the right is a screen grab from my [dusty old] prototype HTML renderer, something which will maybe get incorporated into Book Reader at some point. Until now it was only able to display BMP format images, which as mentioned previously, is not the best file format around.
I really must spend some time getting to know OLE/COM/ATL… I think I was turned off a little early in my programming life by the lack of documentation for doing any OLE stuff without MFC.
What’s wrong with MFC?
To be honest I can’t give a definitive answer, except to say that for the most part MFC is just a bunch of C++ wrappers for the WinAPI, and maybe the WinAPI is not so scarey that it needs to be abstracted away to that degree. These days I generally like to avoid MFC so as to preserve an understanding of how things actually work.
I think I first lost patience with MFC when it came to the area of command routing, the method it uses for passing Menu/Hotkey messages around, and for updating the appearance of menu items (checked/disabled etc). This system worked great until you needed it to do something extra, or a little differently. I can’t remember the details but I did spend a long time banging my head against CCmdTargets and trying to work out how to automatically update the appearance of context menus [I am certain that there is a way, I just gave up before I found it]
I thought it was pretty silly that you had to manually put HotKey descriptions into your menu items, and I also thought it was excessive that every single command added a new function to the main window/application class (as well as an optional UpdateCmdUI function).
Most occasions that MFC would cause a problem for me, the nature of the problem had to do with the fact that MFC was hiding Win32 functionality from me, and had I understood the Win32 functionality in the first place then the problem would have either never occurred or been easily rectified. Once again I refer you to the wisdom of Joel Spolsky, and his article The Law of Leaky Abstractions.
*UPDATE : An alert reader points out that only OLE/COM avoiding luddites like m’self suffer from this image loading deficiency, and that there are in fact OLE image functions which support multiple file types.
Everyone knows that GIF, JPG and BMP [and TIF and a few others] are completely standard image formats, right? So perhaps you can guess how many of these are supported natively by the Windows API*.
[—- please guess now —-]
If you guessed exactly one, you guessed correct. BMP is what it is. And it’s not even a very good format. No one stores images as BMP unless they are forced to.
But not only that…
I recently thought, hell I at least need to be able to get the size of an image (so that HTML editor can do some nice auto-processing) so I’ll just look up the details of the different formats and create an all purpose function to return the dimensions of images. This is of course easy since all images will have their width and heights stored conveniently at the beginning of the file for easy retrieval.
That is the assumption I made, and it turns out to be an incorrect one.
The resolution in a JPG header is a measure of pixel density , and in theory it means pixels per unit. Of course the default unit is a pixel, so what these fields store more often than not is a measure of pixels per pixel , which as you might imagine, usually equates to 1. Brilliant.
For some reason JPEG files store the resolution of the image in their header [the small block of fixed data which kicks off most file types], but not the dimensions. The resolution of an image is one of the most useless and confusing properties there is, and yet they stick this in the header. And not the dimensions. I have a book called Graphics File Formats and it can not give me enough information about the JPEG format to extract the width and height of an image. Honestly, it’s enough to make you choke on your own vomit…
UPDATE: I have since filched some code from JPEGLIB to scan jpeg images for dimensions, so I won’t mention that again ;)
And what’s really annoying about it is that the things that make it annoying are in fact far too tedious to even talk about, so I can’t even bitch about it without sending people to sleep. At least if you’re bothered by other humans/television/the weather you can complain in an engaging fashion.
Turning words into jargon:
MS Visual Studio .NET has replaced the term Workspace [used in previous versions] with Solution . I guess it is meant to inspire optimism and such.
A good rule of thumb is that just about any software related thing that calls itself a ‘technology’ is probably going to greatly annoy me at some point. Everything software is really just a methodology, and anyone who tells you otherwise is trying to sell you something. [Update: I am probably being a hypocrite here because it is almost certain that somewhere on this very site I have described some of my own systems as ‘technologies’ eg my font rendering technology ]
An interesting opinion on the never ending march of new software ‘technologies’ is had by Joel of Joel on Software. In his article The Law of Leaky Abstractions he talks about how no attempt to simplify a fundamentally complex system can ever be 100% successful… ie some of the underlying complexity will always "leak" through. So basically everytime you build a shiny new [object oriented] system on top of a dusty old one, you don’t actually make it easier, because every now and then some wierd behaviour will cause problems at the high level which can not be understood without investigating things at the lower level.
Thinking of tech blog things, here’s a great one I’ve been reading recently, documenting some of the history/esoterica of the Windows API.
This is the first new entry written using the new blog-o-matic functionality of HTML Editor. In truth I am worried that this text will be accidentally whisked away into nowhere land, but we shall just see what happens.
One thing that feels wierd is that new entries will now be written on their own pages… as they are stored in the archive. This makes it harder to spontaneously reference/revist previous entries, and I might have to work out some kind of compromise with regards to this.
The blog should look largely the same as before, only now there will be permalinks at the bottom of each entry (except on the individual entry pages themselves)
I will probably also add some sort of "auto-build" which actually generates new archive pages where appropriate, as HTML editor currently relies on the user to create every page.
To be honest the functionality still feels as messy as hell, because there is no way to get around having to upload potentially megabytes of HTML everytime a change is made to a page template [eg to add a link to the left hand column]. Other tools will do that rebuilding on the server side, so you don’t need to think about it and to a degree the messiness can be hidden from you.
Holding on to WYSIWYG
One thing I would like to hold on to is the WYSIWYG nature of my blogging. As I type this I see it exactly as it will appear, ie I do NOT have to manually enter HTML tags or any of that guff. This has always been a bit of a challenge because I am still using Microsoft’s DHTML ActiveX control as my primary editor, and [ahem] it’s not exactly XHTML compliant, so occasionally I have to go in and add tags and attributes manually because they have been brutally stomped.
The way I have been blogging [until now] has been surprisingly simple. I simply edit the main log page directly. So basically I can see the whole page exactly as it will be seen on line, and then I can edit/delete/insert whatever entries I please. It only gets complicated when I have to do the archives, a process performed once a month or so which involves chopping the page in half and saving the oldest part separately, into a file which will probably never be edited manually again (but still can be).
What I think I would really like to do is to keep that same workflow but remove the manual archiving part, making a tool which will scan the "main" page as currently edited and spit out the separate entries and whatever other archive formats are required.
Yes. I think I’ll try to do it that way.
is that you end up with an ever growing number of pages, and you usually want at least 3 different ways of viewing an entry:
On a "Most recent" page
On a permanent page of it’s own (so that an item can be linked to directly and will stick around)
on a monthly archive page, or some other grouping (perhaps by topic/significance etc)
I’m trying to come up with a method that can handle all these options, but that still relies on simple HTML pages only (ie there is NO server side page generation going on). The problem is that as the blog grows, and as you start wanting nice features like auto-updating mini-indexes for recent entries, it gets to the point where a small change can require pretty much every single html page to be regenerated, which is time consuming [especially in the uploading that will need to be done.]
So I guess what I’m griping about is the hassle involved with making a tool that can create consistent well linked content which looks like it’s got a back end, without actually having a back end.
So, is it worth doing? All I know is that everytime I think: "Hey why don’t I look at setting up Movable Type?" I see the requirements and go "blech!" I just don’t want to get involved with server scripts and databases. Just setting up dome formmail stuff was enough of a pain, and even then my scripts were deleted by my overzealous ISP for not being the "approved" versions.
…where everything seems, like, so LA-A-A-A-AME…
Software Development: see computers
Web logs: lame
I’ve been working on some blog specific features to HTML Editor, and one problem that arises is: Am I going to convert this 12 months of blog into the new format, or shall I just break off into the new format leaving links back to the old? Being a programmer, it’s hard to imagine settling for a split system… I will probably devise a once-off conversion process just to keep things consistent.
There are some real advantages to the way it currently works, mainly that the separation of entries is totally arbitrary and I can edit several entries at once [I have no qualms about revising something written within in the last week… any longer than that and I would probably put an "update" notice in there… unless I was getting rid of something embarrassing of course] The only thing that makes it a pain to maintain is having to break up pages manually, and create the links between them. And even that only takes a minute or two every month, so really I shouldn’t be bothered to change things at all if I didn’t think other people might like a blogging tool which:
Doesn’t require cgi/php access on a web server, and doesn’t require knowlege beyond FTP login stuff
Allows you to work offline, with a local browsable version of your site [ and not requiring a local HTTP server]
Allows arbitrary HTML formatting in the entries
One tricky thing is trying to decide how "general" to make the tool… ie should I simply concentrate on blogging features or should I try to generalize into a tool that can handle posting articles with user defined fields in various formats? My instinct tells me: be a generalist, but experience tells me: quite often you don’t know what you need until you actually need it, and even the best generic solution is always going to end up with bits of cruft tacked onto it for specific puposes. Why not start with a specific solution and generalize it later if there is a real need? I don’t know where this new pragmatism is coming from… I must be getting older.
So that’s pretty amazing, eh?
To celebrate I have converted the whole Jujusoft site to this green-white color scheme [which may look revolting to norms. Did I mention that I am red-green color-blind?].