Fiber to the People, an interesting wired article by Lawrence Lessig on why we might be better off, if the information highway (wow, long time since I used that expression) would be owned by the general public, just like the offline highways and streets.
Of course, he is not arguing for a back-to-monopolies situation, but rather against the incentives that are claimed to be driving forces for former-monopolies and de-facto-monopolies.
Is - in a national economical perspective - the impact of a widely deployed, pervasively accessible and open telecommunications infrastructure not more important, than the savings on initial investment in exchange for monopolized rights to innovate?
The article argues, that the cost-saving competitive effects would all still kick in, as the market is still open as to who can actually build the network. And it is open as to who operates the network. But the owners determine who can innovate on the network. (So it boils down to who finances the initial payments, doesn't it? I'm no economist.) And if the people own the network, they can enable for innovation at the edges of the network. Look at it from the perspective of a national economy: The network infrastructure is a commodity, that many startups and big companies alike can use to innovate. The general public gets proficient with technologies and ways of thinking that other countries will lag in by years or decades.
Read the article and compare it to the situation with the mobile companies. Three companies are installing antennas all over the country. Does this competition really lead to an overall cheaper mobile network infrastructure? I doubt so. (Let the installing companies like Nokia and Siemens compete!) Do they enable innovation? Ever tried to get a two-way SMS service up and running without awkward interfaces? You will probably we stopped by steep initial investments. Ever thought that increased scale and use will make up cheaper prices? Impossible, even with prices way above the actual operating costs. The necessity of a few companies to bring in the investment in infrastructure on their own, leads to artificially high prices (compare the cost of an SMS with the operating costs caused by them) and a high barriers of entry for other participants, blocking innovations for a whole industry sector.
Imagine if the roads would be owned by maybe three companies. They would have built three highways from Bern to Zurich and to earn back the money, they would limit business use to models they can control (and throttle introduction of them to a speed they can handle). Maybe they wouldn't allow vans other than their own and would take 50% of the profit of reselling the goods 100km away? Well, at least we would get cars for 1.- CHF under the provision that we only use one of the three highways for the next two years! Sounds ridiculous, but interestingly, the same political forces that argue for investments in the public road network (instead of maybe public transport) for exactly this public economical reason, argue against drastic commoditization of the telecommunication network.
Turn the network into a commodity, and watch the sector thats stands on it grow into an international leader.
The Netzticker by the Netzwoche is a daily newsletter about the latest events in the swiss web industry plus a few very important international news. It averages maybe eight headlines a day and is sent out at noon. I guess that a large percentage of people working in this industry here read it after lunch.
The problem is, that for web-centric publication, it - in my opinion - doesn't get the web in several important ways. The thing is, they email a form with checkboxes for each headline. Decide for each wether you want to read it, submit and wait for the mail with the real content. This doesn't work in some mail clients and some server configurations (for security reasons), so they recently added a link where you could do the same thing on a webpage - of course you have to enter your email address every time then. Annoying.
This interface lets me think about the relevance of a news item twice, because it comes in two mails. The first time with limited informations. Annoying.
And there is no searchable archive. Annoying.
There are no webpages of the articles. So I can't bookmark and can't link. Annoying.
At least, I can go back and find old headline-mails and request these messages, but the mail's subject will then claim, that they are today's headlines. Annoying.
Sometimes, their software is broken or just plain slow. Like today. There is an intriguing headline about increased open source usage in big swiss companies, but I can't get it. I will, eventually, but then I won't be able to link to it and tell you about it. Very annoying.
Usefulness would increase dramatically, if they would just send out a bunch of links to an online version (of course, these pages would then have links to all other today's headlines). The archive could be permanent and accessible to search engines. People could make bookmarks and links. More useful, more traffic.
Their argument to keep it this way is, that this system gives them information on how popular each headline is. Wrong. On the one hand, you can do the same thing with your access logs anyway. More importantly, access to archives will give them much more information on what is really interesting in the long run. Plus referer logs from search engines will provide additional insight. And their current statistics is skewed towards catchy headlines. They could just use any standard system and save costs.
Of course, if they'd really get a clue, they would - on top of all that - provide an RSS feed.
Update: The technical problems on the Netzticker site mentioned above appeared again today: I submitted the headlines and instead of a mail I just got a webpage with an error message. But this time, I was able to track down the problem: You see, one thing that this format actually does enable in a way the web or RSS don't is that with the same html-form that orders your news you can answer a poll of them; This should actually lead to much high voter turnout and much less duplication in their poll. Today, they ask how Cablecom's price reduction will affect the ADSL resellers. But, ironically, that feature is the root of the technical glitch - if you don't answer the poll, the headlines come in fine. Too bad for this feature.
Jay Allen (of MT-Blacklist fame) announced a new project to fight blog spam. And Mark Pilgrim warns him to mind the people he picks up the fight with.
I agree with Mark's points, but I think he (both actually) miss an important difference between other forms of spam and this comment spam: The primary target here are not the hundred thousands of blogs and their millions of readers. The target are very few companies: Search Engines that use link analysis to rank results, foremost Google.
The goal of the spammers is to generate a high pagerank with good anchor texts for their sites in a short time. Often from scratch, as they will probably have to abandon their domains as quickly. And as Mark explains, the people behind the spams are profit-oriented businesses. If the method principally doesn't work anymore, they will stop spending time and money there.
And the primary targets, e.g. Google, are in a good position to detect them even more quickly: Most comment systems follow a similar convention, especially for dating the comments. Given a deep crawl of the web, it is possible to count the numbers of comments a particular blog owner made on any given day. Set a threshold to something humanly possible and ignore any number of links above that level. (Similar for other links mentioned in the comments; correlate it to the emergence of identical links outside of blog comments). If spammers start linking to subpages or subdomains instead of homepages, treat them the same (whitelist blog hosters). Using many domains will quickly be too costly and if not, limit the total effect of indirect incoming googlejuice by day.
Others proposed to disable the pagerank-effect by redirecting all outgoing links through a script hidden by robots.txt. I fear that - unless this idea gains immense popularity - this won't help, as it is easier to still spam than to detect beforehand that the spamming attempt is futile. And it breaks the web in an important part.
I applaud Jay Allen's initiative, but I think the most powerful defendant of the blogosphere's comment sections is the ultimate target of the spam attacks: Google. (And it would probably be the first antispam measure that would be more effective with being publicly announced!)
In any case, the idea of keeping a worldwide daily comment count might also be useful to populate the blacklist.
Update: By no means, I suggest that the blog community should lay back and wait for Google to fix the problem. But I do think that the problem would be most efficiently fixed at Google, if only because all other anti-spam solutions would be limited to blogs installing extra-software.
Indeed, I think that the idea of global daily comment counts should be considered within the blam project. By the nature of the attack spammers will reveal themselves with a inhumanly high number of comments in the blogosphere.
An additional way to find spammers is to look for identical or near-identical comments (again, works for Google and blam).
A nice side-effect of the project would be a technorati for comment space! Indeed, that would allow me to aggregate all my comments I spread over the web back into my blog. If blam decides against implementing something like this, because it is beyond scope, then I forward this request to the oh, dear LazyWeb.
Here is an observation that often made in the last years and that was recently confirmed: Intelligence in search software is largely only as useful as it is predictable by the user.
But let me digress for a minute: A problem users (novices and power-users alike) consistently mention when asked about their search experience is "too many results". I used to defer this to the problem of increasing relevancy, thinking that as soon as people will find what they want in the first results, they will stop worrying about that number at the top of their results set. More so, as the same people often think of features like stemming and inflecting, automatic query expansion through synonyms and latent semantic indexing (LSI) as being useful and request them when explained to them, although all of them actually increase the number of results.
But dismissing this feedback so easily is dangerous. So what is the real issue with "too many results"? I think we have to look into more psychological factors. Things I hear here are a long the line of "I don't know wether important results are in the deeper results" or "I don't trust the search engine to do what I meant". I think a plausible theory is, that users see their search engine as a tool for their thought process, whose output is only useful as they can fit it into bigger picture and that this is limited by how far they can comprehend what the tool actually does and doesn't.
I think this problem magnifies in fields like law (relevant laws, previous cases), patents (prior art), medical records, etc. Generally it is more important in searches other than global web searches, but even there you don't see much movement towards more linguistically magical technologies. Of course, where recall is important - and it would be important in exactly law, patents, medical records, etc. - such technologies seem more and more appropriate. Still, given a choice many users will choose the "less intelligent" search engine.
The challenge lies in constructing user interfaces that better explain what the search engine did (and didn't) without overwhelming the user. A nice example for a specialty search engine that does this, is Yahoo! Shopping's new SmartSort. Look at how they handle digital cameras: My first result is commented as "Pentax Optio S4 is a subcompact camera. It is ranked first because it has the highest Optical Zoom compared to the others in your top 10 results. This Digital Camera is more compact than Casio EXILIM EX Z4U." followed by "Casio EXILIM EX Z4U is a subcompact camera. It is ranked second because it has the least expensive price and the highest Optical Zoom compared to the others in your top 10 results. This Digital Camera is cheaper and is smaller than Fuji FinePix F700 which is displayed next.".
How could this look in a more general search engine? "All following results are less popular than the ones already displayed" or "The previous results are considered authorities on the subject, the following results mention the subject on other contexts"? Probably too much text, but then it is only needed in cases of urgent curiosity. Maybe a "explain" link is enough, just as Nutch has.
Another important point is that transparency empowers. Intelligence like LSI works only for the problem it was optimized for. The more "stupid" tools are, in practical use, more versatile (see also the argument behind world of ends). Watch your thought process when formulating queries: I tend to mix descriptions of the subject with terms I expect on matching pages, often shortening longer product names to parts I think are unique enough. And think of how you use a search engine for different tasks than document retrieval: Spell checking, comparing usage of combinations of words, checking prevalent meanings by glancing at excerpts, even preparing an actual query by (manually!) interfering synonyms from result sets. All of this is not possible if I have no real idea what the search engine does.
What does this mean in the context of precision and recall? Increase recall with methods that are transparent and predictable in the way they work, avoid magic. Increase precision with transparent, predictable tools (e.g. intuitive measures of popularity; e.g. partition the result set along well known boundaries).
Does this match your experiences with search? Is there research that points in a similar direction?
Recently, activity is increasing in strengthening the role of open source in our government.
One thing, that in my opinion absolutely must be open source for auditing purposes is voting software. Not only vote-over-the-web software but also computer based voting in poll booths. See Phil Windley's essay. Here in Switzerland even that is (AFAIK) quite far away, but it seems like other countries are much further there. And they don't follow that rule and already they seem to have huge problems.
E.g. Diebold sold the state of Ohio their voting software at the same time their CEO promises to be "committed to helping Ohio to deliver its electoral votes to the president next year". Talk about conflict of interests! And not just that, their contract explicitly forbids the state to even touch the machines, let alone examine the machines to verify wether they work correctly. Hello?
Almost predictably, the New Zealand Herald sports a report, where many tight races in key states were unexpectedly won by Republicans, even though pre-polling, exit polling and historic patterns would predict otherwise. In just too many cases to raise doubt. And, conveniently, no way to verify (the machines have printers to produce a written record, but they were not activated).
Then, internal memos leak with thousands of emails describing grave problems operators of the machines have. People publishing them for the public conscience are immediately sued under DMCA violations. Luckily I live outside their jurisdiction, so here you have the zip file. Scoop has more commentary on it, including the most interesting quotes.
Can anyone tell me why this story didn't spread further? Does it fit too much the current meme (Republicans, liars, etc.) to be interesting? I mean, this stuff is at the heart of democracy, and the whole world laughed at US voting procedures three years ago already!? Or is it too risky to suspect the GOP of election fraud, and in the end just this political risk saves them from a proper investigation? (I mean, with all the distrust on the honesty of their goals I already have, I still find myself having a problem believing that they are that corrupt)
In any case, voting procedures have to adhere the highest standards in transparency and this is what has been done in paper-voting for decades and even centuries now. The way to do this in electronic voting is complete open source software and sound principles like logging to reliable write-only media (e.g. paper).
An interesting presentation on connecting learning objects, which consists of an (imaginary) use-case of blog technology in (e)learning environments.
It focuses on the distribution of learning objects (enhanced through RSS), collaboration among educators (through Weblogs) and automatic enhancement of the meta data surrounding the learning objects (through Trackbacks when they are used or discussed, building a library of example uses). Trackback as a protocol is really simple to implement and this presentation shows how other applications than blogs can benefit from it.