Wednesday, December 16, 2009

Finding potential problems in CSV files

I have to periodically work with CSV files from a variety of sources and the problems with them are pretty well known. Here is a little python script I use that helps me find values that contain double quotes, which are often not properly escaped in the files I receive

flines = open(fn,"r").readlines()
currline = flines[1]

for l in flines:
vals = l.split(",")
for v in vals:
if '"' in v.strip()[1:-1]:
print v

Sunday, December 6, 2009

Javascript, sleep and closures

My wife is working on a web site and she needs a different div to show up every few seconds, ad nauseum. I hack in javascript enough to be dangerous, so every time I try something new or something I haven't done in a long time, I turn to Google. My idea for this was to simply have a loop that goes through the elements, makes one visible, then sleeps for a few seconds, show the next one, etc. So I thought I would google javascript sleep and see what came back.

What I got was, to me at least, mostly a horror show.

Lots of examples with busy waiting and splitting functions up and a lot of code that worked, in so much as it did what somebody wanted. But the code was hard to read, hard to parse, hard to update and in some cases would burn up all the cpu you let it. Knowing all of the features available in javascript I KNEW there had to be a better way, but I wasn't finding anything I thought was "good", at least in the first page of google results. Maybe I just didn't look hard enough. But at this point I figured if I thought about this for a few minutes I would come up with a better solution in less time than I could find one that pleased me. Here, after a few minutes of thought, is what I came up with




function divrotate(interval,divarray)
{
currI = 0
currLen = divarray.length
currE = document.getElementById(divarray[currI]);
currE.style.display = '';



showNext = function()
{
currE.style.display = 'none';
currI++;
if (currI == currLen) currI = 0;
currE = document.getElementById(divarray[currI]);
currE.style.display = '';
}

setInterval("this.showNext()", interval);
}



Here is what the call would look like from your HTML page

<script src="/divrotate.js" type="text/javascript">
</script>

<script type="text/javascript">
divrotate(8000, new Array("section1","section2","section3"));
</script>


what you are passing in is an interval, in milliseconds, and the array of elements (in this case DIV) that you want to iterate through. The divrotate function gets the current element and shows it, then a closure is created that will hide the current element, get the next one and show it. setInterval is used to call the closure at the interval specified by the caller.

Simple code, and it doesn't peg the CPU.

I am sure that there has got to be some other examples that are straightforward and allow the equivalent of sleep. This seems like too common of a thing to not be elegantly handled somewhere, but quick searches in scriptaculous and mootools don't show my anything promising. I see lots of forum posts with the usual "Use setInterval or setTimeout, you'll see lots of examples if you Google for it!"

I know this is far from a generic sleep example, but I hope this helps somebody that wants to do something similar. I also hope that it serves as a decent example of closures. Closures are a powerful language construct that, judiciously used, can be of great benefit when they are available. A good discussion of closers in javascript can be found here, and upon reading it there is an example that comes close to what I have done above, but the example is generic enough that it does not show the use of variables that are in scope.

Monday, November 2, 2009

Simple utility for getting data from CSV files

Most programmers at some point spend a lot of time hacking apart CSV files. I recently got to yank apart yet another csv and while Excel is handy for these sorts of tasks, it can't do everything quickly, so I hacked up a little utility using php5 and sqlite to help query data and get results fast.

Here is csvsql

You just upload the file, first row is headers and you can type in the filter part of a where clause to narrow the data. click a button and put out a CSV representation of the data that should be excel friendly.

Thursday, September 24, 2009

A Post After My Own Heart

Joel writes about what he calls Duct Tape Programmers. Best thing he's written in a while, if you ask me. Not that you did, but it is definitely worth a read. I'm also planning on picking up Coders At Work, which he mentions in the article. Sounds like a great read.

Tuesday, September 8, 2009

"Anonymous" data

Just one more reason to be very careful about what information you put on the internet. Most people think gender, zip code and birthdate won't give much away. Most people are wrong.

Tuesday, August 11, 2009

This is just plain cool

When I was at U of M my software engineering professor would talk about how small storage and computers were becoming and how we won't be needing these big boxes anymore.

I saw this on hacker news and it just looks cool. With a bunch of SheevaPlugs you can turn a powerstrip or 2 into a server farm :-)

Sunday, August 9, 2009

Rails Stuff

Finally been hacking on some code this weekend for my rails project. I want to have people login with user credentials and a couple of resources on using restful_authentication that I found EXTREMELY useful are here:

A great demo
http://media.railscasts.com/videos/067_restful_authentication.mov

and

if you get an error when trying to login

http://blog.anthonychaves.net/tag/rails


Also the controller for my default class had this added to it so we don't get into an infinte loop when trying to login for the first time

skip_before_filter :login_required

I also added this on the user controller so we can go create a new user without having to login. Duh!

I'm using Rails for .NET Developers from The Pragmattic Programmers to guide me along with lots of online docs. I'm also learning Ruby as I go. It seems to share with python the same notion of 'what you expect will work'. Pretty much I've been typing things and they are working, without too many tweaks. It's kind of nice. I'm doing super simple run of the mill stuff, so I'm not expecting too many things to blow up as of yet.

Tuesday, August 4, 2009

Best Quote Talking about the Apple App store ever

"Apple requires you to be 17 years or older to purchase a censored dictionary that omits half the words Steve Jobs uses every day."

This little gem is from what appears to be an increasingly common scenario of rejections leading to rants against the App Store. I'm sure the Apple Faithful don't mind (I'm sure the faithful would be happy to watch Steve give their spouses/partners a rim job and then split them open, bathe in their entrails, cook them and then eat them), but for the average developer, it would seem that a lot of Apple's coolness is wearing very, very thin.

Here's the full post about Ninjawords and their debacle.

Friday, July 17, 2009

Sofware Engineering and Metrics

Tom DeMarco (Yes, THAT Tom DeMarco) wrote an interesting piece on his current view of Software Engineering. Definitely worth the read.

Wednesday, July 15, 2009

Clouds can be Dangerous

Agree or not with TechCrunch's decision to publish some data from some Twitter documents they received, I think the most important thing to note is what they say about using gmail and other cloud services

"It’s not our fault that Google has a ridiculously easy way to get access to accounts via their password recovery question. It’s not our fault that Twitter stored all of these documents and sensitive information in the cloud and had easy-to-guess passwords and recovery questions. We’ve been sitting in the office for eight hours now debating what the right thing to do is in this situation. We’ve spoken with our lawyers. We’ve spoken with Twitter. And we’ve heard what our readers have to say. All of that factors in to our decision on what to post or not to post."

I have been wondering how many people will need to get burned and to what degree before they start taking this sort of thing seriously. Given that Google's entire business model is selling targeted advertising, they have an incentive to collect as much data about you as possible. You would think for this reason alone people would be wary of dumping too much stuff into Google's hands. The annoyance factor would get to be outrageous, I would think. That's not even considering that Google has to be a HUGE target for any sort of cracker that wants to track down any kind of information. I'm sure they do their very best to keep everything as locked down as possible, but it's really hard to compromise the information if it isn't there to be compromised in the first place.

Friday, July 10, 2009

The current and future state of programming

An interesting read from Philip Greenspun. Apparently some students were able to find their admission status by modifying a URL. This, apparently, qualifies as "hacking". Ugh.

I am particularly fond of the sentence "As progressively dumber programmers build progressively more complex systems we will see more of this kind of attempt to paper over coding mistakes with lawyers, sanctions, policies, and laws." I know people that have been lamenting this sort of thing for years. To hear it put so clearly is refreshing. Not that anybody will really care. Until their credit cards and bank accounts get hacked, that is. But by then it will be too late. And they'll care for about 20 minutes and then get back on with their lives. *sigh*

Thursday, July 9, 2009

NIH Syndrome

Most people in software are at least aware of Not Invented Here syndrome. We're in the middle of fighting with it right now, on the other side where we have some dependencies that are causing issues because we rely on some libraries for a very important piece of our software and the library has upgraded and if we use the upgrade we are, for all practical purposes, going to have to do a rewrite. So I was going to write this big long article in defense of NIH and why it makes sense to roll your own sometimes, but then I remembered that Joel on Software had tackled this already.

I reread the piece and it is basically saying the exact same thing I was going to say, so here it is. I can't say that I would go to Joel's extreme of "If it's a core business function -- do it yourself, no matter what.", but you need to seriously think about the cost of upgrading dependencies if the dependency is a core piece of what you are doing. It's not a matter of IF it will bite you in the ass, but WHEN and how hard.

Friday, June 19, 2009

Reviewing files from Subversion

Recently I had to deploy some updates to a website and wanted to get a list of all the files that had been updated to make sure I got everything. With recent versions of subversion and using python this turned out to be rather simple:

First, get a list of all the files that have been checked in and stick them in a text file.

C:\Utils\svn-win32-1.4.3\bin>svn diff --summarize -r902:966 svn://myServer/myRepository/myProject/trunk > c:\diff.txt

Then use the following lines of python to get the file names and sort them to make tracking them down easy

mylines = open("c:\\diff.txt" , "r").readlines()
myarr = [x.split(" ")[-1].strip() for x in mylines]
myarr.sort()

Now print out myarr, put it in its own file or do whatever you want with it to deploy or review your updates.

Thursday, June 18, 2009

False Productivity

I'm finishing up a project for a buddy of mine that is in php. It's an old school simple CRUD app and as such I didn't go too overboard with the functionality. It does what the client asked and the code is reasonably clean but not something that you could refer to as having much of an architecture.

Since it is all raw HTML and PHP I had to code everything up by hand. At my regular job we do almost everything in ASP.NET. This got me to thinking as I was working on these two things side by side that, after roughly clocking the hours of similar projects, that a lot of the productivity I've been feeling I've been getting from ASP.NET is false. As one of my co-workers is fond of saying, the complexity has to exist somewhere, all we really do is move it around. This struck me keenly because my layout and design skills are not great, so even coding up a simple interface that does what it is supposed to takes a little bit of effort and since I'm not using any frameworks at all for the PHP app, I got to code it all by hand. In the APS.NET world you would drag and drop a couple controls and be done. Poof!

But, then, the data access layer coding began. And the DAL for the PHP app was dead simple to code. And it wsa doing some reasonably interesting things. Nothing crazy, but a couple joins here and there, you're usual stuff. In ASP.NET, once you go outside their little box, things become somewhat painful. I've had instance where putting together the interface was drag and drop a few things and then the prototype of the data access was a few minutes of drag and drop but then things start happening. The production data has millions of rows and your test data only has a few thousand, so the paging and access for the paging needs to be recoded. Oh, wait, there's a fancy new datagrid widget doo-hickey that your manager read about somewhere. Can we use that? And on and on with the little things that make a seemingly quick and simple job time consuming. Just like that the productivity you thought you had gained went up in smoke. In some cases it actually gets worse because you either have to start over or make what your additions fit into what you've written already AND make it fit into the ASP.NET universe.

I guess the point here is that things aren't always as time saving as they appear. I know people say a lot of things. My language is X more productive than yours. You get demos from vendors showing how easy something is. Three drags, a double click and a pinch of salt and you have a fully functioning app! Just be suspicious of such claims. Chances are the three drag demo has little resemblance to the actual work you do.

Friday, May 29, 2009

Well, I'm drinking the Kool Aid

My brother in law and I had an idea for a project and from what he and many others have said, it would be a perfect Rails project, so I downloaded Rails and got a copy of Rails for .NET Developers and I started reading last night. It looks like it should do the trick. This is a pretty simple application, so it seems like a great fit for what Rails is supposed to be good for. Having seen some of the things people put together for Rails Rumble, I'm hoping I can bang this thing out fairly quickly.

Anyway, given I've done my share of python, C++, C# and VB with a smattering of Javascript and PHP and vague memories of lisp and a bitter taste in my mouth for perl, I'll see how this works out.

Sunday, May 10, 2009

Thank you Microsoft

Look, a lot of us in this industry bash Microsoft and in many cases rightfully so. But this weekend I was doing some work for a friend who has a client that has a site running IIS, SQL Server and PHP. I did the prototype of what I'm working on in PHP and mySQL on my laptop and then had to port the database layer over to SQL Server. This app is far from rocket science, so I wasn't worried in the least about doing the first version against mySQL. So I downloaded SQL Server, installed it on my laptop after hitting many Next> buttons, updated my code to use the standard mssql module that comes with PHP and then started testing. That's when the problems started. I did the usual googling and came up with somethings I had forgotten to do on the install, like enable TCP/IP in the Configuration Manager and other assorted bookkeeping stuff. But after that and getting other clients to connect successfully, the PHP app kept having issues connecting, never mind doing any actual work. I had done a ton of research, tried everything I found and then some, and finally gave up trying to use the standard PHP module.

Since in my regular job I work almost exclusively with Microsoft technology, I am mostly up to date on what they do. I know as of late they've been doing some work to help support a lot of web and open source technology out there. So I was wondering if maybe, just maybe, then had a PHP client. They have a database client for java, after all. So I figured what the hell and found the SQL Server driver for PHP 1.0. After downloading the file, I think it was just over an hour from extracting the files to the PHP extensions directory until I had a working version on SQL Server. I've had somewhat similar experiences lately with flickr and facebook tools found on codeplex.

The one thing that made this very painless is everything I needed to get this installed and configured was all in one place. The help file was actually helpful and told me what I needed to know to get things going. I didn't have ot hit half a dozen websites in order to figure out I needed to have the SQL Server 2005 native client driver installed. It just told me. Granted it was annoying to have to uninstall the 2008 driver and download and install the 2005 driver, but given a co-workers recent experiences with setting up ubercart, this was completely painless. He spent HOURS going from site to site, getting modules, downloading updates, installing dependencies when the modules he installed didn't come with everything he needed, etc. I'm glad it wasn't me. I probably would have given up. Personally I find this the most frustrating part about the open source community. I don't understand how projects like this gain the traction they do. They're hard to set up and hard to keep updated because in many ways these things are like a house of cards. I use some open source tools (python being by far my favorite) but, by and large, I find the lack of good documentation (by good I mean useful. I've found a LOT of documentation on most things, but generally it isn't worth the electrons spent rendering them on my screen) to be a HUGE hurdle to overcome. There is no reason I should have to spend hours and hours searching for documentation that still doesn't fix the problem. In general the open source community doesn't understand why people don't use their products more. How can they keep flocking to "Micro$oft Windoze" and their other schlock when there are cheaper and superior alternatives available???? Well, I think documentation is about 70% of the answer to that question. And the usual response is that open source projects are, in a lot of cases, done by people for free in their spare time and, let's face it, I do not know a single developer that likes writing documentation that is for end users. Well, if you don't like doing that, people aren't going to be able to use your product, no matter how superior it might be in a technological sense.

So I wanted to take a minute to say thank you to the people at Microsoft for putting these types of tools together and for the community that surrounds CodePlex and other similar sites. As a company that is generally viewed as closed and being competitive to a fault, it is nice to see that this is pretty much a caricature of the organization. They obviously have their issues, but I think they are really beginning to realize that if they are ultimately going to not only survive, but thrive, that they need to embrace and support a lot of the other good work going on out there. There are, of course, plenty of selfish reasons to do this, but there are just as many reasons not to do this and it is probably easier not to.

In general I think this is starting to be a return to what made Microsoft the dominant software company on the planet. When office and Windows were first born, they had a LOT of competition and, as others have pointed out, one of the things that made Microsoft software good back in the day is that they went to great lengths to interoperate with other software out there. Excel worked well with Lotus files and Word was able to read and write WordPerfect files, for example. Not to mention DOS and Windows being able to run on a wide assortment of machines. Maybe not well all the time, but well enough.

Friday, April 24, 2009

Case insensitive validators in .NET

I have just done some work on a control that needed to have some validation for dynamically generated text boxes. I did some poking around for adding validators dynamically that would be client side and would check for words by being case insensitive. I found a few solutions to this that involved server side validation and I wanted to keep it on the client. The ASP.NET validators use javascript on the client side and there isn't anything I could find that would easily let me do a case insensitive compare on the client side using the .NET controls. For example, I couldn't pass /sometext/i to the validator and have the 'i' attribute recognized.

A solution hit me on the way home. I was checking for the common occurences of 'true' and 'false', 'True|False|true|false'. Which was probably going to be fine but every now and then you hold the shift a second too long or you have caps lock down and you enter TRUE or FAlse and those wouldn't work out and while they would evaluate to a boolean once they hit the server, they would never get there because of the validation.

So now my expression looks like '[Tt][Rr][Uu][Ee]|[Ff][Aa][Ll][Ss][Ee]'. that does the trick very nicely. It isn't great to read, but this is a very specific case and I've certainly seen worse. And I don't need to do a mix of server and client side validation now, which makes me happy. I wouldn't do this for something that was involved, but for this particular case it is simple to implement, not totally unreadable and prevents a useless round trip to the server.

Wednesday, March 11, 2009

11, 58, 29, 16, 33, 21, 24, 29, ...

These are the minutes remaining on a large file copy (6+ GB) in Windows. Why bother giving the estimate?

That's today's rant. Back to waiting for the file copy to finish. 8 minutes to go. Maybe...

Monday, February 23, 2009

8 steps to millions?

Doesn't this sound easy?

http://www.guardian.co.uk/technology/gamesblog/2009/feb/10/gameculture-apple

I wonder how many people started cranking out yet another game after reading this. I'm half tempted.

Sunday, January 4, 2009

Agile

Generally speaking, I'm not a big fan of Agile Development for two reasons.

1. All the hype.

2. A lot of contradictory information and advice, sometimes from the same person. One time you hear that you have to adopt all the steps of a given methodology. Another time you hear that you can't be brain dead about adopting the steps, but need to pick and choose what works for you. A big selling point of Agile is little documentation, given that as a rule developers hate writing documentation. But then this gets taken to its illogical conclusion and many of the Agile work I've seen has zero design and planning. People just start coding and think the design will magically assert itself at some point. In fact, for a trade that generally prides itself on being logical, there is a lot, in practice, that I see illogical about adoption of Agile methodologies.

That being said, I agree with the position of most items on the Agile Manifesto. It's just in practice I haven't seen it work too well often. Even the reported successes are usually qualified and, depending on the stakeholder you ask, the success really isn't a success.

To that end, I think this latest entry on Artima is a good read and crystallizes a few of the issues that lead to this state of things. http://www.artima.com/weblogs/viewpost.jsp?thread=246513