Michal Čihař - Archives

Playing with HTML microdata

Since Google introduced support for "Rich snippets", I wanted to play with that technology a bit. I've already used microformats in the past, but today's preferred solution seems to be microdata, which plays nicely with HTML 5.

To have some real data to show, I've chosen spolecneaktivity.cz, a community site for free time activities (sorry it is Czech only). It provides me XML export of some data to play with.

After a little bit of hacking, I wrote a Python script to parse that and output HTML with all microdata details I've found in original XML. The output now lives at http://cihar.com/aktivity/ and Google's Structured Data Testing Tool seems to parse this just fine.

The only question is whether it will show up in the results sometimes as it warns "urls are pointing to a different domain than the base url".

Weblate 1.3 on l10n.cihar.com

As Weblate release is getting closer, I've decided to give new version more testing in real life and it got deployed at http://l10n.cihar.com/.

This release brings quite a lot of new features, most interesting for users might be:

  • Better and new consistency checking.
  • Better support for Android resources.
  • More visible data exports.
  • New buttons to enter some special characters.
  • Support for exporting dictionary.
  • Checks for source strings and support for source strings review.
  • Support for user comments for both translations and source strings.
  • Better changes log tracking.
  • Changes can now be monitored using RSS.

You can see full list of changes in our documentation.

As usual, I hope this upgrade will go smoothly and won't cause any big problems :-).

Migrating content

I seem to have spend too much time online and used too much of blogging engines. Yesterday, I've completed yet another import of content to this blog, to have all my content available in single place.

Looking at the import scripts I already have four of them - Movabletype, Nanoblogger, Wordpress and recent addition was Drupal.

For Drupal, I had to install Node export module to have some sensible way to get the content. Then it was just matter of parsing XML and fixing up links. Quite challenging was rendering of Texy! markup in Python as there seem to no native solution available. Fortunately there is a XML-RPC service available, what made that part quite easy.

l10n.cihar.com switched to Weblate

I've just switched https://l10n.cihar.com/ to Weblate as I think it's stable enough to replace Pootle installation there. This change immediately affects translation of all projects which were done there - phpMyAdmin, Gammu, Wammu, GePeS or Ukolovnik. In addition Weblate itself is also being translated there.

Your old usernames and passwords should be valid (in case you've activated the account and used it in last year), but you might want to adjust your profile as the settings were intentionally not imported (many users had corrupted name in Pootle and hope this will let them fix it).

Let's see how many bugs will this switch reveal, but I expect it will work quite smoothly.

Fixup of historical data

This blog consists not only of posts created in it, but also older posts written elsewhere. The vast majority of them comes from older blog engine I used here (Nanoblogger), but almost same amount are posts written in Czech originally posted on abclinuxu.cz.

It turned out that when I was doing import of those, I somehow forgot (or rather did not care) to fix links between them and links to rest of abclinuxu.cz website. With few lines script and huge help of BeautifulSoup these should be now fixed. So you can properly navigate over almost seven years of my blogging history.

While looking at other failing URLs here, I've also added number of redirects, so that most people end up on pages where they expect :-).

Náhodné SEO

Prohlížení statistik, co uživatelé hledají na mém blogu (respektive co zadali do vyhledávače než sem přišli) je většinou poměrně zajímavé.

Před více než měsícem jsem tu sepsal svoje zkušenosti z rekonstrukcí panelákového bytu a tyto jsou nyní zdrojem mnoha překvapení. To že se příspěvek o Panelreko bude objevovat hned mezi prvními výsledky po zadání názvu firmy jsem rozhodně nečekal.

Nicméně ještě větší překvapení je vyhledávací fráze "jak udělat botník". Návod na stavbu botníku rozhodně v mojí recenzi skříní od Amonitu nehledejte, protože kdybych botník vyráběl já, tak by dopadl podstatně hůře.

Tiny blog improvements

Finally, I've found some time to do some improvements on this blog engine.

First thing which was more and more missing is reasonable content for description in HTML headers. This lead to sites like Google+ always fetch the constant text and that did not look that. So now, there is something really relevant to the content, either first paragraph (what is the case for all old entries) or something I can manually edit (in case I won't be too lazy).

For automatic extraction I tried several approaches. First attempt was simply extracting first paragraph from Markdown, however I quickly remembered that quite a lot of older posts were written directly in HTML, which made this big failure. Later trying BeautifulSoup on rendered HTML, what lead to missing spaces around text which was originally in links. Finally I discovered html2text, which worked pretty easily and does what I need.

In the end the code looks like following:

def get_auto_summary(self):
    h2t = html2text.HTML2Text()
    h2t.body_width = 0
    h2t.ignore_images = True
    h2t.ignore_links = True
    h2t.ignore_emphasis = True
    text = h2t.handle(self.body.rendered)
    return text.splitlines()[0]

Other change affects Flattr, as providing Flattr links for each posts is now possible even without javascript (it's called Auto-submit URL), I've decided to implement them this way. Let's see how many things will get on Flattr this way (at least it will give me better indication what somebody considered being useful). Thanks for commenters on my older blog post for hints.

Good bye Ad Bard

For some time, I've shown Ad Bard ads on various sites related to free software I run (this blog, Wammu website or phpMyAdmin demo server. I saw it as not that intrusive way for paying fees for hosting of the server which runs this all. Also I found it good idea to promote free software friendly business, but was always a bit skeptical to it's success.

Today the service is supposed to be shut down, though the web site still does not mention anything like that so far. But it was announced month ago to the publishers and I expect it to be true. The reported reason is lack of time to focus on improving the network and selling advertisements.

I've removed the ads from all my servers and I currently do not plan to replace it with anything else for now. So you can enjoy them ad-free even without AdBlock :-).

Imported old Czech posts

Just for simple reason having all my blog posts in single place, I've decided to import all Czech blog posts from abclinuxu.cz into this blog. You can now find them in archives, all of them being tagged with Czech.

There is no intention to add new blog posts in Czech for now. In case I would might change my mind sometimes in future, all English posts are tagged with English.

Changed website

After last fixes during yesterday evening, I've decided to put new website online today. It matches layout of my blog, uses same technology underneath (Django).

The biggest change is probably simplification of the structure and also cleanup of old unused stuff. I tried not to miss anything, but I'm sure something will be discovered in next days.

With this change, I've also shutdown my SVN server as all things has been migrated either elsewhere (usually Debian's collab-maint SVN repository) or are now using Git either on Gitorious or repo.or.cz.