Thursday, January 26, 2012

Weird .NET Regex

I was working on a test for SimpleXml and encountered a really weird regex behavior.

I was trying to have a multiline regex match some xml to verify that it had been updated correctly.  I chose regex just because I thought it would be simpler than using an xml parsing library (other than SimpleXml, and I wasn't sure I liked the idea of using SimpleXml in the SimpleXml tests...).

For example, I was trying to match xml like this:
<root>
  <node>test1</node>
  <node>test2</node>
</root>
with a regex like this:
Regex.IsMatch(xmlString, "<node>test1</node>.*<node>test2</node>", RegexOptions.MultiLine);
It should match, but it's not matching.  I tried all kinds of variations throwing in end of line and start of line matchers, etc and nothing worked until I found this:
Regex.IsMatch(xmlString, "<node>test1</node>.*\s*<node>test2</node>", RegexOptions.MultiLine);
For giggles I tried it with .*.* but that doesn't work.  The only pattern I found that worked was .*\s* and I really don't understand why.  So if you can explain why, I'd love to hear it!

update:
Thanks commenters!

Turns out there were 3 things I thought I understood about regex that I didn't:
#1: As explained on regexlib.com \s matches any white-space character including \n and \r.  So that's actually all I needed.  No .* required, and no Multiline option required.
#2: Multiline doesn't change the behavior of .* to make it match newlines like I thought.  It only affects $ and ^, as explained in msdn here.
#3: Singleline is the option that changes the behavior of .* to make it match \n.

So, the final regex I needed was simply:
Regex.IsMatch(xmlString, "<node>test1</node>\s*<node>test2</node>");

Monday, January 23, 2012

SimpleXml

I released my very first Open Source project this weekend! It's called SimpleXml.  It's a tiny, single file, 180 line dynamic xml parsing library.  Really it's just a simple wrapper around XElement.

The source, issues, and docs are hosted on bitbucket: https://bitbucket.org/kberridge/simplexml/
And it's published to nuget: https://nuget.org/packages/SimpleXml

SimpleXml was inspired by PowerShell's xml support.  There have been a number of times I've wanted to do some small simple xml reading/writing job in C# and really wished I had the simplicity of powershell's xml api.  Now I do!

You can checkout the bitbucket page for more examples, but here's a simple one:
dynamic x = "<root><child>myvalue</child></root>".AsSimpleXml();
Assert.AreEqual("myvalue", x.root.child);
It doesn't get much easier than that!

Hopefully this will prove useful for someone, but my main motivation for creating it was just to have the experience of creating and open sourcing something simple from scratch.  It would be awesome to have the full experience of people forking the repo, and submitting pull requests too!

Saturday, January 14, 2012

CodeMash 2.0.1.2

Josh Schramm did a CodeMash recap.  And in the spirit of maximizing the value of your keystrokes (as presented by Scott Hanselman at CodeMash and blogged about by Jeff Atwood), I thought I'd do the same.

This year was my third CodeMash.  Every year I enjoy my time at CodeMash more than the last.  It was nearly 2x larger this year, but the "feel" of it didn't seem to change at all.

Precompiler
Vital Testing - Jim Weirich
Jim gave a good introductory talk to some of the elements of TDD that are hard.  He asked everyone to rate themselves in these categories, then focus on which categories they wanted to work on while TDDing some katas.

I really enjoyed his insight and perspective to TDD even though it was pretty basic.  But this session was still one my favorite of the entire conference because Ben Lee spent it teaching me Erlang.  We did the Greed Dice Game kata and came up with this (nearly complete) solution in Erlang.  Erlang totally blew my mind and renewed my interest in functional languages.  I hadn't programmed in this style since College, so it was really awesome to get exposed to it again.

Day 1
Keynote
Keynote was good.  Ted Neward basically talked about being Pragmatic in how you approach building big systems.  He walked a fine line between saying that Enterprise needs to be simplified without ever saying that Enterprise is a bad word.  Also, he swore alot.

Here are some of the recommendations I really liked:
  1. Resist the temptation of the familiar
  2. Reject the "Goal of Reuse"
  3. Befriend the uncomfortable truth
    1. be cynical
    2. question the assumptions 
    3. look for hidden costs
    4. investigate the implementations
  4. Eschew the "best practice"
  5. Embrace the "perennial gale of creative destruction" (AKA, you will have to learn new things)
  6. Context matters: create an evaluation function of your own for new tech
  7. Attend to goals
Inside the Microsoft Web Stack of Love - Scott Hanselman
Hanselman is an amazing presenter.  His room was overflowing 15minutes before he was even scheduled to talk, but he kept everyone entertained by first typing funny stuff into notepad, and then playing YouTube videos.

He did a bunch of demos of a bunch of stuff and made one umbrella point: that MS wants to unify all the tools under ASP.NET and encourage devs to combine these tools as needed.  For example, create an app that uses both MVC, WebForms, Signal R, and Web API.  I was most impressed with Web API, which as far as I could gather is just the new WCF REST stuff.  WCF is a really bad word in my office, because WCF was really awful.  We like to say it takes x time to write a WCF service, and 2x time to configure it. But the new Web API looks alot like MVC, but without all the attributes!  So it's even cleaner!

Mastering Change With Mercurial - Kevin Berridge
I was very happy with how my talk went.  I probably spent 30+ hours preparing and practicing this talk on one of my favorite subjects: DVCS.  It was a combination of drawings, screenshots, and screen capture videos.  The most memorable part of it for me was how many questions I got.  People were very interested in Mercurial Queues in particular, which is a pretty complicated topic.  So I was glad I'd presented it in a way that obviously aroused people's curiosity enough to want to understand it better.

Functional Alchemy - Mark Rendle
Mark showed a bunch of different functional techniques implemented in C#.  The most memorable were a Memoize implementation, a .AsAsync extension method, and a clever Try Catch trick to DRY up catch blocks.  Just about everything he showed I intend to use at some point in our work projects.

I was able to ask him after his talk if there were any performance concerns with depending on lamba expressions so heavily in C#.  His answer was fascinating.  He said in .NET 3.5, the cost could be non-trivial, but in .NET 4, you could practically wrap every expression in a lambda if you wanted to.

Effective Data Visualization - David Giard
Visualization is a concept I've been really excited about recently, but haven't started to dive into much yet.  This was a fun talk with lots of examples of different visualizations.  And it presented many of Edward Tufte's rules: Lie factor (change in data/change in visual representation), Data-ink ratio (data ink/non-data ink).

Day 2
Dealing with Information Overload - Scott Hanselman
I went to this talk just to be entertained, but I think I will actually get something out of it.  Scott recommended making a list of all your data "inputs" and ranking them in terms of priority to YOU.  Stuff like work email, home email, twitter, facebook, google reader, and even TV.

C# Stunt Coding - Bill Wagner
I learned a couple new things in this talk about .NET's Expression object.  And the first example literally applied to the code I had open in my lap at that very moment, which was such a wonderful coincidence!  It also made me realize I need to spend some time digging through the framework.  They've added so much new stuff in 3.5 and 4.0 and I never took the time to really study the additions, as I figured I'd discover them eventually.

Capability vs Suitability - Gary Bernhardt
I actually was sitting in Applied F# waiting for it to start when Corey Haines tweeted that Bernhardt's talk was so great everyone really needed to attend.  And if you know me, then you know Bernhardt is kind of my hero.  So it was taking all my will power to stay in the F# talk, even w/ how pumped up I was about functional from my previous Erlang experience.  So once Corey Haines piled on, my will power lost out and I switched to Bernhardt.

It was an interesting talk with some cool history.  I think the biggest take away for me was his discussion of Activity vs. Productivity.  If you type really fast, you are productive at making characters appear on the screen, but that doesn't necessarily mean you are Getting Things Done any faster than the next guy.  So, Activity != Getting Things Done.  And I suspect I suffer from that.

His point with the Activity thing was that when you see a lot of activity, like in Ruby, that doesn't mean they're really accomplishing a lot of practical work.  He went so far as to say Java is probably where the real work is getting done.  While the Ruby people are running around, being active, making lots of noise.

His broader point was when you see all that activity, it's probably an indication that there is something new happening.  That they are pushing the capability boundaries.  And when you look at the history of our industry those expansions of capability are usually followed by contractions to suitability.

It was a pretty thought provoking talk.  Not least of all because I think it's an over simplification that I don't fully agree with, but I haven't been able to put my finger on it yet.

Conversation - With Everyone
I go to most of the sessions at CodeMash 'cause I'd feel guilty if I didn't.  But what makes CodeMash worth my time is really the conversations with so many people doing so many different things with so many different tools on so many different platforms.  It's like what I try to do at Burning River Devs * 1000.  And it's what renews my energy for the rest of the year to keep fighting all the technical, process, and people related battles that come with building software.

Sunday, December 18, 2011

Stories of Productivity

The first time I tried pomodoro, it was exhausting.  Staying completely focused and working for 20 minutes straight tired me out!  I couldn't believe it!  I thought I was very focused, all the time.  I thought my productivity was good.  I couldn't even work for 20 minutes!

--

I used the demo of TimeSnapper for awhile once.  It's a neat program.  It monitors the applications you use throughout the day.  It can even play back a video of what you did all day, greatly sped up of course.  You tell it which applications are productive, and which aren't, and it has this neat timeline graph that shows green for productive time, and red for unproductive time.  In using it I quickly discovered something that I was not consciously aware of, but was very interesting.  As I was working, if I hit a point where I had to either wait for the computer, or I didn't know exactly what to do next, I would switch to an unproductive application.

For example, if I was coding an algorithm, and I hit a particularly difficult part of it, I'd pull up twitter.  Or hit up google reader.  Or check my email.  It was like some kind of weird nervous twitch.  Any time I had to ACTUALLY think, I'd go do something that didn't require thought.  And I was totally unaware that I was doing it.

--

Recently I was taking a screencast of myself doing some work at the prompt.  It was just a proof of concept, so I hadn't planned it out, and I was sitting in front of the TV at home.  I knew what commands I wanted to record, but I hadn't really thought through the examples.  You could see I was typing fast and quickly moving from command to command.  But then I'd hit a part where I had to make up some nonsense content to put in a file, or think up a commit message, and there would be this really long pause.  The pause was way longer than it actually took me to come up with the content.  What was happening was, as soon as I needed to do some creative thinking, I'd glance up at the TV and get lost for a few seconds.  And again, I was totally unaware this was happening until I watched the video.

--

One of the things I've been struck by when I watch Gary Bernhardt's Destroy All Software screencasts, or the Katacast he did, is how fast he is.  Now, he practiced this stuff, it's not like you're watching it come off the top of his head.  But even still, he's FAST.  But I realized, the thing I'm most impressed by is really not how fast he can type.  I mean, he can type fast, and very accurately.  What's most impressive is how he is always prepared for the next step.  He always has the next thing he needs to do queued up in his brain.

Once I noticed this, I started trying to figure out how to get closer to that during day to day development.  In a surprising twist, what I've found so far is the best way to go fast is to go slow.  That's kind of a cliche, but it's overwhelmingly true.  If I give myself the time to think things through, I waste a lot less time in starts and stops and blind alleys.  And if I take just 1 second longer to fully visualize all the steps of what I'm about to do, I'm able to execute it faster, smoother, and with a lot less stress.

--

We recently re-did our office arrangement.  We tore the walls down and made sure everyone was sitting with their teams.  There have been some nice benefits.  For one thing, it's way more fun.  There are many times when spontaneous design and organization decisions are made just because everyone can hear you.  And I think we've built a better sense of team in the process.

Of course there are downsides.  It can get noisy and be distracting.  Especially when random conversations and jokes break out.  I think it's just human nature to have this desire to not be left out of conversation.  You can put in head phones, but I find sometimes even music is enough of a distraction that I can't get my thoughts straight.  And because I don't want to be left out, I usually keep the volume just low enough so I can track what's going on around me.

So there is a trade off with this open spaces, everyone together layout.  You gain some productivity in instantaneous meeting-less decisions.  You gain some camaraderie and some fun.  But you can't close the door and shut out the world so you can fully focus when you need to.  I'm still not sure how I feel on this one.  The focus and productive obsessed part of me likes Peopleware's advice of everyone in their own office with a door.  But the social part of me likes the team room model.

Thursday, December 1, 2011

Powershell: Extracting strings from strings

I was playing with NCrunch.  It didn't work for our solution due to some bug w/ named parameters in attributes.  So I removed it.  But it left behind all kinds of little .xml files.  I could see these files in hg st as "?"'s and I wanted to remove them.

So I used this simple powershell command:
hg st | %{ [regex]::match($_, ". (.*)" } | %{ $_.Groups[1].Value } | %{ rm $_ }
The regex captures the name of the file, skipping the "? " at the beginning of the line.  The Groups[1].Value extracts that file name.  And rm removes it.

That version is using the .NET regex class directly and piping the object that matches outputs.  You can make this shorter, though slightly more confusing in some ways, using powershell's -match operator:
hg st | %{ $_ -match ". (.*)" } | %{ rm $matches[1] }
This is using the magic $matches variable which is set to the results of the last executed -match operator.  The reason I say this is slightly more confusing is that it depends on the order of execution of the pipeline.  This wouldn't work if the pipeline ran all the -match's and then ran all the $matches.  But because the pipeline is executing each %{} block once for each object, it does work.

If hg outputted objects instead of text, this would have been much easier.  But this shows how you can lean on regex's when you have to deal with strings.

Friday, September 23, 2011

Powershell and Hg Magic

I moved a bunch of files that were in an hg repo and did an hg addremove -s 100.  They all should have been recorded as renames, but hg summary showed me that 1 of them wasn't.  But which one?

Powershell to the rescue!
$s = (hg st -a -C) -join "`n"
[regex]::matches($s, '^A.*$\n^A.*', "Multiline")
Lets break this down:
  • hg st -a -C: lists all added files including what file they were copied from.  Hg st considers a rename to be a copy and a remove.  For each renamed file this will output two lines:
    A <file\path\here>
      <copied\from\path\here>
  • $s = (...) -join "`n": takes the array of strings resulting from the hg st command and joins it into one big string in the $s variable.
  • [regex]::matches($s, '...', 'Multiline'): Runs a multiline regex on the string
  • '^A.*$\n^A.*': Regex matches a line that starts with an A, followed by anything to the end of the line, followed by a line break, followed by another line that starts with A, followed by anything.  In otherwords, this will match if two lines of output both start with A.  In this case, that means the first line is the line that was not recorded as a rename!

Tuesday, August 23, 2011

.NET is Stale?

Here's dhh on twitter: "Wish someone would study the cultural inhibitions in Denmark that binds it to stale, conservative platforms like .NET"

.NET is stale?  Fuck you!

Not to mention the language features of C#:

Is C# the most elegant language ever invented?  No, but it is one of the most elegant I have used, especially for a statically typed language.  And the language itself is clearly one of the most advanced available.  This is stale?

Did all of these ideas originate in .NET?  No, but what the hell difference does that make?!  The .NET community finds and adopts the best ideas, whether they started in Java, Ruby, or Python.  This is stale?

Are there companies still using .NET 2.0 and little to no open source software?  Yea, there are also companies on the bleeding edge, using all the tools listed above.  From organizations with strict upgrade guidelines, to organizations that wait for the first service pack, to organizations that go to production on beta releases.  You'll find it all in the .NET community.  This is stale?

Ruby is a joy to program in.  Dynamic languages are more fun to do TDD with.  Percentage wise, I'm sure more Ruby programmers participate in the open source community.  There are a wide array of really great things about Ruby (and Python, etc etc).  There are also plenty of shitty things (poor backwards compatibility, poor documentation, poor tutorials, elitist attitude, etc etc).

But this bullshit attitude that .NET is stale, outdated, joyless, or somehow dramatically inferior is nothing but short sighted and stupid.  Get over your buyer's remorse and go build some software that contributes to something larger than yourself.

* Did I leave off your favorite fresh .NET tool or feature?  Leave it in the comments.