Friday, July 31, 2009

Collection Types in Python 3

I would like to document some useful snippet for handling collections in Python 3.x.
set:

To add item to a set:

s.add(x)


To check whether item is in a set:
x in s #return True if set s contains item x


To remove item from a set:
s.remove(x)

Awk Command in Linux

awk is a very useful command. I would document some very simple but useful example here. As a rule of thumb: we use "condition {expression}" in awk.

#Extract first column while the lines are comma seperated

awk '{FS=","}{print $1}'


#Check whether second column is "Artists", if so, print first and second columns.

awk '{FS="\t"} $2 = "Artists" {print $1 "\t" $2}'

Randomly picks n lines in a text file

I try to google, but I cannot find the easy way to do this in command. Often times, I would like to randomly sample (without duplicate) lines in a text file. Here is my python code.

#!/bin/python3
# The goal of this script is to randomly choose n lines from a text files by line without duplicate


import sys, os, pdb, random

#The prefix will add to the front of each line
prefix = ""

if len(sys.argv) != 3 and len(sys.argv) != 4:
print("syntax: randomlines.py <input_file> <num_of_sample>")
sys.exit()
else:
input_file = sys.argv[1]
num_of_sample = int(sys.argv[2])

if len(sys.argv) == 4:
prefix = sys.argv[3]

#pdb.set_trace()
lines = open(input_file, "r").read().splitlines()
if len(lines) < num_of_sample:
print("The number of random lines you asked for is larger than the lines in the target file.\n" +
"Shrink it to lines of target file automatically.\n" +
"-----------------------------------------------\n")
num_of_sample = len(lines)

chosen_lines_num = random.sample(range(0,len(lines)), num_of_sample) #range(0,3) only generate 0,1,2
for i in chosen_lines_num:
print(prefix + lines[i])

Wednesday, July 29, 2009

Enable Latex support in blogspot

Here is an example:
$$\pi = \int_{0}^{1} \frac{4}{1+x^{2}}$$
becomes:
<a latex equation in pic>

Here explains all the details you need.

*Update*: The previous method doesn't work because the http://www.forkosh.dreamhost.com refuse to service other public website :)

Sorting Python Dictionary by Value

This is a slower version:

sorted(adict.items(), key=lambda (k,v): v)


This is a faster version:

from operator import itemgetter
sorted(d.items(), key=itemgetter(1))


source: here

However, what if value is a list, and we would like to sort by the items inside the list?

For example, we have a dictionary variable x:
>>> x
{'a': [1, 2, 3], 'b': [0, 3, 1]}


We can sort them by the following command
>>> sorted(x.items(), key = (lambda k: k[0])) #sort by key
[('a', [1, 2, 3]), ('b', [0, 3, 1])]
>>> sorted(x.items(), key = (lambda k: k[1])) #sort by value
[('b', [0, 3, 1]), ('a', [1, 2, 3])]
>>> sorted(x.items(), key = (lambda k: k[1][0])) #sort by first item in value(a list)
[('b', [0, 3, 1]), ('a', [1, 2, 3])]
>>> sorted(x.items(), key = (lambda k: k[1][1])) #sort by second item in value(a list)
[('a', [1, 2, 3]), ('b', [0, 3, 1])]