Janitor Programmer: 2010

Thursday, December 16, 2010

The problem with Javascript

Let me open this by saying that I like JavaScript. The syntax is comfortable for a lot of people and it lets you do all kinds of funky stuff because functions are first class objects and prototype based classes. It also enjoys an extremely wide install base because it is in almost every browser people use. I believe that it has a bad reputation mostly because it has suffered from years of poor and differing implementations. I also think it bears the brunt of having a variety of DOMs (These days IE vs. everybody else) associated with it; much the same way that I think C++ has a bad reputation because a lot of people's experience with C++ began and ended with MFC.

But there is one problem JavaScript has that was clarified to me after reading this critique of Google Closure. The problem, in a nutshell, is that the developer has to act as an optimizing compiler in many instances. Looking at most of the critiques, they revolve around optimizations that most people in the 21st century are either accustomed to having taken care of for them, or their computing power is so great compared to what they do that they don't care. At this point an interpreted language in a browser is handicapped in both of these areas.

Looking at the list of critiques, slow loops because you access a property on every loop check, slow case statements and dealing with multiple types of strings are all things that most developers don't have to worry about anymore. The problems with the code in the library and more efficient ways to do the same thing cover a large part of the post. That you can write many paragraphs on this sort of thing and that these optimizations are necessary is indicative that implementations of the language still has plenty of room for improvement. The concatenation of "" to a value to make it a string more efficiently than using String() is, to me, the most egregious example of this. (somevalue + "").replace() may be fast, but it sure is ugly to look at.

The author at one point even takes a jab at Google, saying their time may have been better spent just writing proper JavaScript instead of making the investment in making Chrome's JavaScript performance better. I'm guessing this is tongue in cheek, wry humor. Otherwise the implication is that JavaScript is fine on the performance front. As long as the programmer has to perform the sorts of optimization hacks described in the above post, that is simply not the case.

Thursday, December 2, 2010

random subsets of data ii

In doing more work to get random subsets, the previous solution fails miserably if you are going to be using a view. You can run the EXEC statement in a stored procedure but cannot use it in a view. And furthermore, calling a stored procedure from a view is somewhere between extremely problematic and impossible, depending on who you talk to. Using the RANK function and partitioning the data you can get a similar result.

Let us use the same idea as the previous example and assume we want at least one employee from every department. The only requirement for the following query is that you are asking for more employees than you have departments.


select top 100 T1.* from

(select 

RANK() over 
(PARTITION by Department order by newid()) as r,  

* 
from Employees
where Active = 1

) as T1
order by t1.r, t1.Department

Note, as before, that this is a sql server query. Modifications may need to be made for different flavors of SQL. What this is doing is for every department the employees are getting randomly ranked thanks to order by NEWID(). By selecting the top 100 and ordering by rank and then department, you'll get all of the 1's from every department, then all of the 2's and so on until you get 100.

The order by department is strictly unnecessary. I did it to make viewing and verifying the results easier.

Friday, November 19, 2010

random subsets of data

I recently had a request to return a random subset of contacts from a database, but to be sure that there were some contacts from each group represented. Let us say that you have a database with employees and you want to get a list of some random people from each department. Here is how I went about doing it:



declare @s1 varchar(max)

select @s1 = coalesce(@s1 + ' union all ', '') +
e from (

select distinct
'select * from (select top 11 percent * from 
Employees where Department = '''
+ Department + ''' order by newid()) 
as [t' + Department + ']'  as e
from Employees) T1
exec (@s1)

What we are doing here is generating the text for a query that will select the top N contacts from the Employee list for each department. The coalesce function will put all of these queries together into one query that will union the results into a single table. The exec function will execute the query.

Note that this is a TSQL example to run on SQL Server. You may have to translate this, depending on your database system.

If you want a fixed number then what I would do is tweak the percent to get a number than is slightly more than the number desired then just select the top N from that. This runs the risk of not getting somebody from every department. Another method would be to select a number slightly less than your target, then select some random number of records to add to the result. This is more complex, but may be what you need.

Friday, October 29, 2010

Returning data tables from Web Services in .NET

This is just a small reminder, and I don't know why it gets me every time, but it does. I guess I return data tables from web services infrequently enough where I don't think about it. Maybe writing it down somewhere will help me remember. When creating a data table to return from a web service, give the table a name.


DataTable dt = new DataTable("nameMyTable");

If you don't, the first time you hit the service when testing you will get an exception about the table not having a name. And then you will curse to yourself and then give it a name and move on with your life.

Friday, October 22, 2010

scripts for processing stuff

Often I need to write something up quick to process a file, or process files in a directory and I'll want a GUI for these things so I can pass them along to other people to use. Here are two python scripts that I use as templates for picking either a file or a directory and then doing something with it. The object here isn't to have everything "correct" according to the latest design fashion or methodology. With this script I can import a python module that takes a file or directory name and does the processing I want, plug in the function name, pass the parameter and then move on with my life.

Pick a file and do something with it:


from tkinter import *
from tkinter.messagebox import *
from tkinter.filedialog import *

class FileProcessor(Frame):

# Object constructor
    def __init__(self, parent=None):
        Frame.__init__(self, parent)

        self.txtFile = Entry(parent)
        self.txtFile.place(x=6,y=64,width=375,height=24)
        self.txtFile.insert(0,'')

        self.btnPickFile = Button(parent,text='Pick File', command=self.btnPickFileClick)
        self.btnPickFile.place(x=6,y=20,width=96,height=24)

        self.btnRun = Button(parent,text='RUN', command=self.btnRunClick)
        self.btnRun.place(x=6,y=114,width=96,height=30)


# Methods (event handlers) of object
    def btnPickFileClick(self):
        print (self.txtFile.get())
        self.txtFile.delete(0,END)
        self.txtFile.insert(0,askopenfilename())

    def btnRunClick(self):
        fileName = self.txtFile.get()
        self.runFile(fileName)

    def runFile(self, fileName):
        print('code goes here to run when click: ' + fileName)
        showinfo('The file is',fileName)

# Method called if script is run directly
# instead of imported or used as a class
if __name__ == '__main__':
    root = Tk()
    root.title('Process File')
    myForm = FileProcessor(root)
    myForm.pack()
    root.geometry("423x156")
    root.minsize(423,156)
    root.maxsize(423,156)
    root.mainloop()

Pick a directory and do something wit it:


from tkinter import *
from tkinter.messagebox import *
from tkinter.filedialog import *

class DirectoryProcessor(Frame):

# Object constructor
    def __init__(self, parent=None):
        Frame.__init__(self, parent)

        self.txtDir = Entry(parent)
        self.txtDir.place(x=6,y=64,width=375,height=24)
        self.txtDir.insert(0,'')

        self.btnPickFile = Button(parent,text='Pick Directory', command=self.btnPickFileClick)
        self.btnPickFile.place(x=6,y=20,width=96,height=24)

        self.btnRun = Button(parent,text='RUN', command=self.btnRunClick)
        self.btnRun.place(x=6,y=114,width=96,height=30)


# Methods (event handlers) of object
    def btnPickFileClick(self):
        print (self.txtDir.get())
        self.txtDir.delete(0,END)
        self.txtDir.insert(0,askdirectory())

    def btnRunClick(self):
        dirName = self.txtDir.get()
        self.runDir(dirName)

    def runDir(self, directoryName):
        print('code goes here to run when click: ' + directoryName)
        showinfo('The directory is',directoryName)

# Method called if script is run directly
# instead of imported or used as a class
if __name__ == '__main__':
    root = Tk()
    root.title('Process Directory')
    myForm = DirectoryProcessor(root)
    myForm.pack()
    root.geometry("423x156")
    root.minsize(423,156)
    root.maxsize(423,156)
    root.mainloop()

Thursday, September 30, 2010

Putting a string on a single line

Often times text is written on multiple lines for the sake of clarity. Sometimes these lines need to be consolidated to a single line. This happens to me a lot with SQL queries that end up needing to be in a config file somewhere. I'll write them out all pretty so they make sense (as much sense as some SQL can make...) and then I'll need to consolidate that down to one line. Here is a python one liner that does just that, given a string named mystr

" ".join([x.strip() for x in mystr.split("\n")])

Wednesday, September 8, 2010

Don't make me slap you, Beavis

Every single time somebody asks "how hard can it be?" or "why is the estimate so big?", I want to smack them. Hard. Some folks at 37signals were kind enough to offer an example that answers these sorts of questions when asked about a seemingly simple feature.

I applaud the fine folks at 37signals for publishing things like this.

Monday, August 9, 2010

splitting csv files

I'm sure I've ranted about this before. I'll rant about this again. I just hate seeing examples from people in C# where they say that you just split a CSV file by using mystring.Split(',') and you will get an array where each item is the value for each field. One would assume, since they mention C#, that at some point in their lives they have worked with excel style CSV files that have data like

"Bob","Hello, Dear","""Dude, where's your car"""

The string split method obviously will not correctly handle this case at all. Here is a page I found with a nice little function that works for me in most cases.

http://www.tedspence.com/index.php?entry=entry070604-124237

Hopefully this will work for you, too.

Friday, July 30, 2010

I leave it as an exercise to the reader...

When did this become a euphemism for "this is hard"?

I say this because I'm doing research on Visual Studio 2010 and one of the selling points is the Parallel Extensions for .NET. In a great many of the articles and tutorials I've found, the author goes through some totally brain dead examples that have no relevance to real life to show the basics. This is then followed by a discussion of some of the more interesting parts of the framework and then a paragraph or two about things you might try and finally something like "I leave it as an exercise for the reader to...." followed by something that might actually be interesting.

This is one of the reasons, by the way, that I prefer actual books to cobbling together a manual from various web pages. I know a lot of people say they will never by another programming book again because you can find everything you need on the web. I have not found this to be the case. And even when it is the case in a lot of instances one has to go through a lot of effort to cull the good nuggets from the pile of information you find. It's almost like panning for gold. Most good books do this for you already and the really good ones even include some decent examples of using libraries and techniques that have some relevance to real life.

I just needed to vent. Back to my research.

Friday, July 23, 2010

Working with outlook

I found a nice piece of software called Outlook Redemption. I had to process some msg files that I had exported and this thing came in handy and was super simple to use. For what I needed, this was enough to get going.

From time to time I'm still amazed at the software that is freely available to just get things done and move on with your life.

Saturday, June 19, 2010

link with good info on posting to page

here is some information on posting to pages in facebook using the API

http://forum.developers.facebook.com/viewtopic.php?id=45796

Friday, June 18, 2010

NULL is a four letter word

I used to be a big fan of NULL. It represents something that isn't really there and in many, many cases it makes sense logically and mathematically. I even had a good way of explaining it to lay-people that didn't really get it. think of it like bank accounts and account numbers for those bank accounts. An existing account with a balance of zero is different than not having an account at all, right? This worked for most people to help explain the difference, especially in text fields where the value of some fields would be the empty string and others would be NULL.

Lately though, I've come to the realization that this doesn't really matter. You can explain this in a multitude of ways that make sense, but when Judy in marketing wants to send an email to all of the people that are not in the Accounting department, she doesn't care about NULL. As the more technically or mathematically inclined among us know, if the department field for somebody is NULL, in most databases it would not be comparable to a string because it has no value to compare.

Select * from Employees where dept != 'Accounting'

will not return the people where dept is NULL in most cases. And this is usually very bad in the real world.

I'm done with NULL if anything close to a reasonable default exists. It just eliminates so many real world problems. I know from time to time I've come across people on discussion boards that have expressed this view and for the longest time I've disagreed with that notion on principle. I know many others that have as well. After much wailing and gnashing of teeth, I finally get it.

Consider it the 8th dirty word, after fuck, shit, piss, cunt, cocksucker, motherfucker and tits.

Thursday, June 10, 2010

Fun with HTML 5

I found a sample that had a simple particle system in html 5. I made one small improvement (pass in the type of particle to render) and changed it from raindrops to what could be a simple campfire. I like fire :-)

I put this up on a page here

Monday, May 17, 2010

test facebook like button

Friday, February 19, 2010

The cyclical nature of programming

Anybody who has been in this business for a long time knows that there are a lot of cycles in technology. Boom/bust cycles. Cycles where we process a lot of stuff on servers, then distribute it to nodes on the network, then move back to processing a lot stuff on servers. I'm sure there are others, but these are the two big ones that come to mind.

One thing I've noticed lately is that this is happening over on hacker news. Over the past couple of weeks there appear to have been a lot of links to items that I read many years ago. Some of them weren't even new then. Lots of Peter Norvig stuff. Some links to old Dykstra papers. That sort of thing. Things that should be part of the collective programmer consciousness, I guess.

I think this sort of thing is why we spend a lot of time rebuilding the wheel, as it were. You don't have to look to hard to find a large group of software folks that are convinced the credentials and formal schooling are waaaay overrated. But then you run into these periods where a large group seems to stumble upon some piece of writing or some paper and has to share it with the world, as if they've discovered something new and exciting and awesome. While I respect and love the enthusiasm of these sorts of things, I'm wondering how much overlap there is in these two groups. I would guess a lot. Maybe that's just my natural pessimism and cynicism kicking in.

I know that several well known people have lists of required reading for programmers. It just seems that these aren't very well known in the programming community at large. I've passed some of these around to various people. Some of them really bug me, however, because they are ephemeral. I know a lot of people, for example, that used to recommend one or more Extreme Programming books. Very enthusiastically. Now, they wouldn't touch them with a ten foot pole. They've moved on to Scrum. And soon I imagine they will move to some yet unnamed flavor of the month. Mostly in an effort to collect more consulting revenue.

I think I'm going to start collecting some of these. I'd like to break this particular cycle. Here are a few that come to mind. They mostly contain basic fundamentals or things that have stood the test of time. They are the sorts of things that are, in some cases, part of the collective hacker unconsciousness. Eric Raymond has written about this hacker culture stuff and has done a much better job than I could hope to do.

Eric can be polarizing, but he has some great writings. No matter what side of the ESR fence you sit on, at least he makes you think.
How to become a hacker
Cathedral and the bazarr
The Art of Unix Programming

norvig.com - Just an all around smart guy with lots of good things to say.

Joel on Software - The forums on this site used to be GREAT. The business of software board is still pretty good. The older essays are definitely worth a read.

The Mythical Man-month. Fred Brooks. Classic book. Short, too, which is nice :-)

I'll add others as they come to mind.

Wednesday, February 10, 2010

Writing more code isn't always the answer

This is one of my very favorite stories of writing good code and how metrics can go horribly wrong.

http://www.folklore.org/StoryView.py?project=Macintosh&story=Negative_2000_Lines_Of_Code.txt

"Bill Atkinson, the author of Quickdraw and the main user interface designer, who was by far the most important Lisa implementor, thought that lines of code was a silly measure of software productivity. He thought his goal was to write as small and fast a program as possible, and that the lines of code metric only encouraged writing sloppy, bloated, broken code. "

Generally speaking, I'm with Bill.

Janitor Programmer