Last week Urs and me had an interesting discussion about Blogs, FOAF and their implications on self-inflicted loss of privacy. I personally think, that the society will embrace transparency in a practical and useful way and that people will learn to handle and respect that there will be much more personal information around by default in the future. I especially like what Elke recently said to Gregor: "... desire of people to rebuild their existence online". All of this is a very interesting line of thought and I have to elaborate on this further sometimes. Some of my current projects have to do with several distinct aspects of this new publicness.
While I delved away into deep thoughts, Urs immediately started with practical work and went ahead to develop one of the things we touched in the discussion into Postback, a proposal for posting into other blogs. Cool!
There are a few things, I like to add: The proposal touches two things: Privileges and a mechanism for crossposting. And I think that one should discuss these two issues separately as long as possible to enhance reusability. For example, in a simple model, it would be possible to use FOAF as a place to store privileges and then use trackback enabled categories for crossposting. Thus, only the receiving end needs to implement something.
As for security, if you only want to restrict source blogs, but don't want to exclude the possibility that some unknown rogue users crosspost article from privileged blogs that shouldn't actually be crossposted (a relatively minor problem most of the time since no really spamming is possible), then you only need to trust the source for the RDF statements that give the privileges and just using the selfhosted FOAF file is enough. We are getting at implementing the trust layer of the semantic web cake! PGP authentications would only be needed if you wanted external people to manage these privileges. Certainly an interesting direction to think about!
But on the whole the interesting part of the previous paragraph is that you authenticate other blogs instead of other people! Your security would basically state that you allow crossposts from these specific blogs into these specific categories. But who posts to these blogs is beyond this control.
If you wanted to control this, too, then you have to look at things like Ken MacLeod foaf-check and build challenge/response authentication into the Postback transport part. OTOH, this would be an interesting addition for TrackBack et al, too.
I use "crossposting" above, since I believe that these posts should at least also appear on one own's blog - Otherwise we would just have a group blog which I think is a different thing in terms of carrying personality! OTOH, implementing group blogs through privileges expressed in FOAF is a neat reuse of the privileges part. Maybe a whole CMS privilege system can be expressed in this way? Can we reuse existing vocabularies in the FOAF module?
I just came across darcs, a revision control system. It is not just yet another CVS replacement; no, it is seriously cool. Well, that is, at least if you have - like me - a background in theoretical physics.
One thing is, that it is written in Haskell, a purely functional programming language. But even cooler is that it is actually based on a theory of patches, in which patches are seen as being analogous to the operators of quantum mechanics!
Darcs doesn't have a central repository, every checkout is its own repository and consequently also its own branch. Patches are coherently defined operators, that get your source code from one state to another, and they can be applied in different orders (that is, if they commute), can be merged, have an inverse, etc. and all these operations are defined in a nice theory heavily oriented around quantum mechanics lingo. Very Cool. At least for me :)
Of course, such a system allows for a few excellent excuses:
"I know it doesn't work, and I have checked all the last patches. But every time when I look at one of them closely, I can't reproduce the problem anymore." (Heisenberg Uncertainty Principle)
"Your code worked, but then I looked at your patch and I saw that actually, it could have never worked. That was the moment, when it stopped working." (Schrödinger's Cat)
"Your code does all these wonderful things, but as soon as I look at your patches alone all but one feature stop working. And every time it is a different one." (Decoherence)
Amazing. Watch The Memory Hole's 5-Minute Video of George W. Bush on the Morning of 9/11. After being informed, that a second plane hit the WTC and that America was under attack he just sits there. And. Does. Nothing. For. Five. Minutes.
Someone should edit this video with inline's of video footage of the WTC in one corner and a counter of poeple calling in offering their immediate help (donate blood, etc.) in another.
For your American friends: American Traveler International Apology Shirt. Also have a look at the explanation for the non-english version. In fact, in my last backpacking trip, most US citizens introduced themselves with "I'm from the US" immediately followed by "And I hate Bush". Or they just said, that they are from Canada...
Jeremy Zawodny runs a post from the mysql mailing list, stating that High Availability is NOT Cheap. I agree on most of the points.
But there are two things I'd like to add: Where you should start working on high availability and why it more worthwhile than you might think.
(the first part is quite technical, you might want to skip to the "is it worthwhile?" part below)
The quoted posts first asks you about the acceptable outage rate. This is really a good question and it is important that you ask it at the right level. Revisit the end to end argument. That is, if you only run a few applications on your system and you have control over them (as is typical in a website scenario), then you can reach a reasonably high availability for a relatively low price by getting the ends right instead of focusing on the low level components: High availability of a 100% concurrent database server is pretty hard, but changing your application (the end in the argument) to handle outages sensibly is often much simpler for the same or higher degree of reliability! More specifically, you can answer the "how many nines do I need?" question for each component of the application, letting you focus on the important problems while not driving up costs for the unimportant.
Let me give you a (slightly simplified) example: The adserver of search.ch. The adserver is responsible for choosing and displaying the different forms of advertising present on search.ch. A lot of data is involved: Currently running campaigns and their rules for when and where and by whom a banner should be seen, the number of times the advertising was already shown (to get the scheduling right), the history of banner views of the active users (so that we don't bore you with the same advertising all the time) and the logs for when, where and by whom the advertising was seen and maybe clicked on. Worst of all, only a small fraction is write-once-read-many (which is really the simplest form of application to distribute if you think about it). As there are advertisings on all popular pages of search.ch and we don't want broken images or unnecessary delays, the requirement for availability for this component is really high. Really? What is really important, is that I can keep serving the right banners to the right places. Much less important is that I never loose track of a view or a click; if the systems 'forgets' a few, we will automatically run the campaign a little longer. The most critical part is easy to distribute: Just replicate the campaigns/rules data set to every node in the adserver cluster. If one goes down, the others take over. All the write operations on the data set are spooled and then synchronized regularly; if a server goes down, this small pocket of information in the spool will arrive late or - in the worst case - is lost. In this case not severe. I guess the total amount of money we actually lost this way is in the order of a good espresso :-) What we did, was looking at a rather complex problem at the database level (a lot reading and writing, information that needs to go to every node, etc.) and by asking the right questions at the right level, turned it into an almost embarrassingly parallel problem, gaining cheap high availability by routing around the hard problems.
Of course, redundant switches, loadbalancers and a bunch of PCs are needed, but all of this is commodity and not really expensive. The devil is in the details here, too, so don't underestimate the time to set this up and pay attention to the details. But for most applications, you don't need mainframes if you control the ends and are willing to think through the problem spaces for all of these ends.
So is it worthwhile? This is the second thing I'd like to ask and IMHO the answer is yes. But, I'm not talking about five-nines availability here. I am talking about basic redundancy in your systems to avoid most incidents of emergency actions. And surprisingly, this will easily put you in the 99.95% range, where user errors (when administrating or changing things) will top your reason-for-failure charts along with IBM disks. The difference is in life quality when "disk crashed" or "server down" doesn't mean "RUN! NOW!" but "hey, I'll be in that region anyway tomorrow, I can drop by a check out what's going on". For everyone who works in such environments this remark is pretty obvious and all the others will probably underestimate the effect. Some of our most popular services have to run on multiple machines anyway for performance reasons, but since we migrated all popular services, which could easily run on single machines, to the distributed architecture our maintenance costs decreased significantly! That is, all our applications (ends in the sense above) run distributed and as soon as you get used to the idea, implementing in this way is only a small extra effort with a huge payback. That's why I my opinion, thinking along high availability is always worthwhile.
BTW, the first comment in Jeremy's post (about the consultant with a chainsaw) reinforces my impression that consulting has a lot to do with show biz. Of the B-kind in this case :-)
Remember Rain Man with Dustin Hoffman? Some autistic people (so called savants) can exhibit genius-like behavior and the question how this works might lead to interesting knowledge about the way we think. Two recent articles highlight this amazing phenomena:
NY Times: Savant for a Day
Weltwoche: Insel der Begabten
Especially the TMS apparatus mentioned in the first article sounds intriguing. One understanding of this phenomenon seems to be, that a normal brain has in some sense so many layers of abstractions, that basic capabilities like fast integer calculus get hidden. TMS (Transcranial Magnetic Stimulation) tries to shut down some parts of your brain to magnify the capabilities of the rest (or the expose this lower layer ..). Downright scary is a mentioned military projects: Shutting down parts of the soldier's brain to increase battle performance under stressful situation or when the solider didn't get enough sleep.
Oh, and by the way, this would break Aaron Swartz's algorithm Algorithm for Determining Imagination from Reality :-)
Gregor notes, that democrat US presidential candidate Dennis Kucinich also startet his blog. Like Gregor, I think that having politicians with frequently updated blogs are a welcome change.
Technologically much further ahead is US presidential candidate Howard Dean with several, longer running blogs. He also uses meetup.com and other tools quite effectively. I think that his campaign is a very good example for a grassroots internet marketing campaign. Dana Blankenhorn has a good summary and commentary in his article A Visit to Deanistan. If you know someone in politics, then please make her or him read this!
The presentation of Haystack was surely one of the (many) highlights of the 12th WWW conference in Budapest. You could feel how everybody became more and more excited almost to the point were it was almost unbearable to sit still when you really wanted to jump up and down in excitement. For me, this was one of the most vivid live application presentations of a semantic web vision. (Unfortunately as an island and in the current state, playing around with haystack around is only a fraction of the fun watching Dennis Quan demonstrate the power of this model).
The folks at the MIT Laboratory for Computer Science are also publishing some interesting papers on other things, they came up with when building Haystack. One is the notion of User Interface Continuations (Via Chris Langreiter). Very interesting read, indeed. They idea of capturing an ongoing operation in its current state, thus implying complete modelessness and the ability to even save and distribute curried (as in functional programming) operations.
This made me think, wether this is in some sense a backport of a web UI element to traditional applications? Like the back/forward buttons as a generalized undo for things like changing directories, viewing mails, etc. In a GET-based and sessionid-less web application, the complete state on an ongoing operation is captured in the URL. It can be bookmarked and sent via mail. You can also run several of these operations in parallel. You can even curry operations for later user, like this phonebook form for queries in Bern. So it occurs to me, that this is quite similar to what is described in the article, isn't it?
In my post about query statistics for search.ch yesterday, I wrote on the importance of the implicit notion of ranking in all sorts of result pages: User increasingly expect some intentional meaning in the order the items are presented, not a mere reflection of the underlying database structure.
Much too often you still find many sites, whose designers merely construct a few pages around a given database design, making the user think like a computer scientist to find the required information, e.g. to answer non of the questions in a form to get an overview of all items and generally only exposing things through complex query interfaces, even if browsing would be perfectly possible (e.g. short lists, like upcoming event). These are designs that help the computer instead of the user. Probably, this is often due to the use of seemingly easy toolkits, which expose the database content almost 1:1.
Some even ask the user the enter information in awkward syntaxes. Jeremy Zawodny recently pointed to a hilarious site covering this phenomena, the "No Dashes Or Spaces" Hall of Shame: A list of sites, which can't even do the simplest transformation on input data, e.g. stripping non-digit characters. Many more can be found with this query.
For a presentation, we collected some statistics on user behavior on search.ch and I thought it might be interesting to share this with a wider audience on this blog. Also, Tim Bray's second article on search technology provides some commentary on this kind of data, as it probably more or less similar for all public search engines.
| Words | Share of Queries |
|---|---|
| 1 | 53.0% |
| 2 | 30.3% |
| 3 | 11.8% |
| 4 | 2.9% |
| 5 | 1.2% |
| 6+ | 0.7% |
Tim quotes an average length of 1.3 words for queries. On search.ch we have a slightly higher average of 1.7 words, but it is still true, that the single word query is still the most popular way of using search engines and that a lot of optimization efforts have to go in this kind of query.
| Position | Share of Clicks |
|---|---|
| 1 | 57.0% |
| 2 | 12.3% |
| 3 | 7.9% |
| 4 | 4.8% |
| 5 | 3.5% |
| 6 | 2.7% |
| 7 | 2.1% |
| 8 | 1.6% |
| 9 | 1.6% |
| 10 | 1.5% |
| 11-20 | 2.6% |
| 21+ | 2.4% |
The first page counts. Users rarely click to the second page in hope to find better results, they tend to issue a refined query instead. I think that in many of the cases, when someone goes to the later pages, he or she actually wants to get a deeper overview of everything that matches this query (as opposed to scanning for the fewer relevant hits).
| Results | Share of Sessions |
|---|---|
| 10 | 91.7% |
| 20 | 3.7% |
| 30 | 1.8% |
| 40 | 0.8% |
| 50+ | 2.3% |
Quite telling is the statistic of the share of clicks each result position yields. Actually, this data is a little aged since I have to turn on redirects on the results, which I don't want to do too often. Anyway, you can see that the action is really concentrated on the top spots. The top three get more than 75% of all the clicks.
I think, this emphasizes how much you should care for the ranking in your search and search-like components on your sites. The search slot metaphor is popular in the users' mind and they expect that the first match is the most important. So, if you sort your matching products alphabetically, you probably miss more potential than you think. With bad luck you might be even presenting an out-of-print item as your top result...
Idle Words' Weblog Crawl Report currently finds more than 460'000 blogs and gives very interesting statistics on language distribution (Altough Live Journal's 500'000 active journals are obviously missing). While it is not a surprise that english dominates, what makes up the other half of the blogosphere caught my interest: Obviously blogging is very popular in Brasil (more than 50'000), Poland (almost 40'000) and Iran (more than 20'000). Iceland features at least 3500 blogs with a population of just 160'000, so more than 2% are blogging there! Often, the driving factor seem to be popular hubs and blog posters like Blogger Brasil, blog.pl and Persian Blog. Via the interesting blogcount.
So, where is Switzerland?
Obviously close to my own business, Tim Bray launches a series of articles on full-text search, starting with an article on the business surrounding search engines: On Search: Backgrounder.
He remarks - and I agree -, that under the hood, "all search engines work more or less the same". I think, that today, the difference lies in knowing how to integrate a search facility in a really useful way, as this can often make a huge difference in the quality of a site or tool.
Britt Blaser posted a nice story about how to fill your life, that I think is quite memorable. I relate it to a poem that I once saw on display in a cafe: If I Had My Life Over - I'd Pick More Daisies.
After abandoning development of a standalone Internet Explorer for Windows, Microsoft announced last week that they are also halting development for the Mac version. I guess that this in no big loss for the users and web designers who wanted to be really sure that their pages look decent on IE had to use a windows machine anyway as IE wasn't compatible enough to itself... But this gap will grow as IE will still live in Microsoft's next OS version but not on any other platform.
Very interesting, though, is the reason Microsoft is giving: "Safari is turning into a better answer for (Apple) customers." (though they hint that this is due their lack of "access to the Macintosh operating system that it would need to compete", AFAIK I know this is just spreading FUD).
See also Christian and the comments there on this subject.
Hmm, this reminds me of a old IE for Mac Bug which I had to track down once: There is rare HTTP header "Content-Location" which is set by Apache when you use Content-Negotiation (to display the content in the right language and encoding for example). Then there is a commonly used HTTP header "Location", which - in combination with 302/303 HTTP status codes - initiates a redirect. Strangely enough, IE4 for Mac kept redirecting the user to the actual filename of script, stripping away everything behind the '?' and thus making queries on content-negotiated scripts impossible. Probably, they looked for headers with strstr().
(technical post ahead)
I wouldn't have thought, that I link to a MSDN article that soon. But, then it is a Microsoft article about performance (!) and anyway, my standard excuse for bad code has always been 'performance issues' :-)
Writing Faster Managed Code: Know What Things Cost by Jan Gray. Interesting read, even if you avoid .NET. And don't miss Tim Bray's comments.
"I promise I will not ship slow code. Speed is a feature I care about. Every day I will pay attention to the performance of my code. I will regularly and methodically measure its speed and size. I will learn, build, or buy the tools I need to do this. It's my responsibility."
I think, that it is useful to once think about how your programming language actually does all those nifty things under the hood. Not only to be aware of performance killers, but I often feel that this understanding gives me more comfort in what I do - and depending on the feature in question, seeing through levels of abstractions is very helpful when tracking down a bug.
Very interesting are his numbers on CPU cache effects towards the end of the article. Understanding how the memory is managed by your runtime environment will get you a handle on which data structures and algorithms use your CPU's cache effectively and which don't. While working on our search engine we encountered several significant optimizations, often rather nontrivial, with often slightly more complex code as a result. The search engine is written in C, keeping a handle on such issues through abstractions is not easy. But having the CPU cache in mind and getting these things right is in our experience one of the most important and generally underestimated factors to keep in mind if you work with a lot of data and your goal is to write software with good performance.
People working on compiled languages on linux might want to check out Valgrind, which is not only one of the most useful debugging tools if you do your memory management yourself, but also features a nice cache profiler.
Update: Christian Langreiter links two interesting articles about the important role of the L2 cache: Treating DRAM as a paging device and Memory Hierarchy and Data Locality.
Thomas Burg: Microcontent Management Systeme - Weblogs als Business Anwendung: Nice introduction to blogs, short comparison with more traditional CMS systems and an overview of possible business uses.
I just read, that in South Korea 70% of the households have broadband access. And not just meager 256kbps or 512kbps as is typical here, there the typical connection is 2mbps. Wow! And I thought I live in a relatively well wired country...
As this older article about South Korea's new president implies, the impact of the net in politics and society is huge and probably a very interesting preview to what we can expect here in the future. If I just extrapolate what always-on changed for me to a whole society...!
Unfortunately I don't speak Korean so I can't check for myself. It would be very interesting to find out how the whole media industry and the style of marketing adopted to the web there (the article gives a few hints, but how does advertising spending figures changed, what about the contents). How much more connected are the people to each other there, did the average Korean expand his (virtual) personal network in different verticals like is often suggested? How distributed is the traffic on the sites (are there a few winners, which more of less are the Korean web, are there many smaller sites or is even micro publishing becoming an important factor?)? 2mbps offers a much better average online video streaming experience than we know here, what does it imply? Many more questions come to mind.
Just out of curiosity, I tried to compile a list of blogs in and around my home town of Berne. You can find it on the main page, right below the blogroll under Bern Blogs.
Please mail me if you know others.
It will be interesting to watch this list will grow over time.
"Die Kernkompetenz der Schweiz ist Diskretion"He said this, while noting that one of the trends supported by upcoming technologies (including blogs!) is increased transparency and that people will start to expect it more and more and that they will learn to do live with it.
(Switzerland's core competence is discretion)
In Switzerland, two electronic payment systems for private users will be introduced this year. One is yellowbill by the Swiss Post, which is already online, and the other is paynet, which should be implemented by most major banks by the end of the year. Note, that it was actually possible for most banks (except the Swiss Post's Bank, which handles most offline bill payments) to find a common, neutral standard!
Unfortunately, their design is bloated. I will try to present a much simpler solution.
Both systems work similarly: When you order something and give your address, you can tell companies supporting the system your finance institute belongs to, that you would be prefer to handle the bills electronically. That would not be the same as LSV (Direct Debit), where the amount would be automatically deducted from your account. Rather ticking this option and giving the biller your EBPP-Id will result in all your bills appearing in your online banking application than in your mailbox. That is, of course, you will see them the next time you log in to your virtual bank. (I am not sure yet, wether I find this delayed confrontation with my bills a good thing or not :) ).
In both systems, the actual bill is still stored on the biller's server and not at your bank. This is a good thing, otherwise your bank could learn not only who you buy things from but also exactly what you buy there! Naturally, this implies some mildly complex software at your bank and your biller as you would neither want your bill at a publicly accessible URL (I don't know the details, but I hope nobody is relying on security through obscurity, i.e. supposedly unguessable URLs; probably it will be some sort of signature in the URL and/or a server-to-server communication between your biller and your bank).
This makes clear why the target market for this system are the bigger companies which send many 100'000 bills a month. The overhead to introduce something like this in these companies' internal billing systems and public web sites / services will be huge and complicated. OTOH these are also the companies which have the biggest saving opportunities.
If the users will actually start using this in big numbers, that is.
User conversion! Of all the companies that plan to use yellowbill I have a billing relation with exactly one. I seriously doubt that many people will go through the paperwork to sign up in either of these systems and will actually remember their IDs on their next orders, if most of their bills will still come the old way anyhow. And it wouldn't be like there are any signs, that this ratio would change very quickly soon: This system is only targeted at big companies. And most of my bills come from smaller companies or even small societies. You know, the companies where there had to be someone, who actually put the bill in the envelope by hand! It's the chicken and egg problem, only magnified.
I fear that these EBPP systems will spread very slowly only, because of a Big Problems Call For Big Solutions situation. Of course getting all the banks to agree on a standard was a big effort, so the system implementing ought to be similarly big and encompassing.
Now then, big words you say! What do I propose, you ask?
OK, you probably didn't fall from your chair because of this one. But think about it. What replaced a lot of traditional paper mail? E-Mail! Everybody understands this: "Would you prefer to get your bills by E-Mail?" I would want it! How do I get my bill presented. Not my bank's design decision! This will depend on the capabilities of the biller: My local astronomy society will be happy to remind of the reason in ASCII in the mail. Others might want to attach a pdf. Others (like many online shops) already have an online "past orders" system which can be easily reused here. There are so many useful things I can do with emails: I can print them, I can archive them and I can even forward them (as, luckily, I don't have to pay all the bills I get). And I already know how to do that.
That was the easy part; how about the billing information? This would be four units of information: The account number, the reference code, the due date and the amount. The first two have codes (checksums), the third one comes in a few standard formats and the last one is often prepended by the currency symbol or ends with ".-" or ".x0". It wouldn't be too difficult to add a "paste the payment part of your bill here and I will try to figure out what I have to do" field in the current online banking applications. OK, granted, this one needs a little learning and tad of confidence from the user but it is still much simpler than signing up in either of the above mentioned systems. An alternative would be to forward the relevant parts of the mail to your bank (which you could tell to either directly pay the bill, but that would require some sort of sender authentification or that the bill should appear in your next banking session).
And there is another way (a little more complex, a little more integrated): Create a simple file format with its own suffix and mime type, let your banking application register for this suffix and mime type and then you can attach payment information in your emails. Double-Clicking this attachment will add the bill to your next banking session (no need to login now, just note it). Or you drag'n'drop it to your running banking sessions. Creating this attachments is a matter of simple tools, which can even be web based.
So, this version won't get you down to three clicks to pay your bills (as advertised by yellowbill), but almost. Much more important is that everyone can issue electronic bills like that. Many people even do something very similar already ("Hey, remember when we bought that gift for Steve together, you still owe me your part..." including account number and amount).
A very low tech solution that implements only its own part. Completely open, simple to integrate, simple to extend, very useful.
Why do I write it up? I had the opportunity to plug this concept to one of the usability guys involved in one of those projects. Maybe this blog entry will serve to support this idea.
This is why video footage of usability tests is recommended for convincing programmers(or webdesigners or managers)
After lurking in the blogosphere for several years now, I now finally get around to start my own.
Originally, I wanted to start with an entry explaining how I personally understand the concept of blogging and why I consider it to be an important movement. I wanted it to be a great First Post. And while working on it and thinking about it, time passed, I came across several things that I would have wanted to blog if I had already written that very first post... Now, enough is enough.
So I guess I learned my first lesson about blogs even before my first entry :) Today I decided to start anyway and to develop these thoughts over time in several entries. I think there will be other more interesting things for you to read here, anyway!