Michal Čihař - Tiny blog improvements

Tiny blog improvements

Finally, I've found some time to do some improvements on this blog engine.

First thing which was more and more missing is reasonable content for description in HTML headers. This lead to sites like Google+ always fetch the constant text and that did not look that. So now, there is something really relevant to the content, either first paragraph (what is the case for all old entries) or something I can manually edit (in case I won't be too lazy).

For automatic extraction I tried several approaches. First attempt was simply extracting first paragraph from Markdown, however I quickly remembered that quite a lot of older posts were written directly in HTML, which made this big failure. Later trying BeautifulSoup on rendered HTML, what lead to missing spaces around text which was originally in links. Finally I discovered html2text, which worked pretty easily and does what I need.

In the end the code looks like following:

def get_auto_summary(self):
    h2t = html2text.HTML2Text()
    h2t.body_width = 0
    h2t.ignore_images = True
    h2t.ignore_links = True
    h2t.ignore_emphasis = True
    text = h2t.handle(self.body.rendered)
    return text.splitlines()[0]

Other change affects Flattr, as providing Flattr links for each posts is now possible even without javascript (it's called Auto-submit URL), I've decided to implement them this way. Let's see how many things will get on Flattr this way (at least it will give me better indication what somebody considered being useful). Thanks for commenters on my older blog post for hints.

Comments

wrote on Jan. 13, 2012, 9:20 a.m.

Hi, It looks like the auto-submit URL is broken, I get an error message "en_US invalid language...".

wrote on Jan. 13, 2012, 9:33 a.m.

Ah, still one place where I've forgotten translation of en_US to en_GB. Thanks for noticing, should be fixed now.