Posts tagged programming
So you REALLY don’t know regular expressions?
Ever since I started my new job, I’ve noticed a curious phenomenon. I work with two wonderfully gifted programmers who both know PHP much better than I do, and I learn something new from them all the time. However, neither one of them really knows or uses regular expressions.
Now, as I learned Perl before I learned PHP, naturally I learned regular expressions quite early on in that process. In Perl, regular expressions are a huge part of the language – you simply cannot get away without learning them to some extent as they are used extensively in so many parts of the language.
Apparently I’m not the only one to notice this. Here’s a quote I found on Stack Exchange:
In earlier phases of my career (ie. pre-PHP), I was a Perl guru, and one major aspect of Perl gurudom is mastery of regular expressions.
On my current team, I’m literally the only one of us who reaches for regex before other (usually nastier) tools. Seems like to the rest of the team they’re pure magic. They’ll wheel over to my desk and ask for a regex that takes me literally ten seconds to put together, and then be blown away when it works. I don’t know–I’ve worked with them so long, it’s just natural at this point.
In the absence of regex-fluency, you’re left with combinations of flow-control statements wrapping strstr and strpos statements, which gets ugly and hard to run in your head. I’d much rather craft one elegant regex than thirty lines of plodding string searching.
While I would hesitate to call myself a Perl guru (at best I would call myself intermediate with Perl), I would say I know enough about regular expressions that I can generally get useful work done with them.
Take the following example in Perl (edited somewhat as it didn’t play nice with TinyMCE):
$fruit = "apple,banana,cherry";
print $fruit;
@fruit = split(/,/,$fruit);
foreach(@fruit){print $_."\n";}
apple,banana,cherry
apple
banana
cherry
Now, this code should be fairly easy to understand, even if you don’t really know Perl. $fruit is a string containing “apple,banana,cherry”. The split() function takes two arguments, a regular expression defining the character(s) that are used to separate the parts of the string you want to put into an array, and the string you want to split. This returns the array @fruit, which consists of three strings, “apple’, “banana”, and “cherry”.
In PHP, you can do pretty much the same thing, using the explode() function:
$fruit = "apple,banana,cherry";
echo $fruit."\n";
$fruitArray = explode(",",$fruit);
foreach($fruitArray as $fruitArrayItem)
{
echo $fruitArrayItem."\n";
}
apple,banana,cherry
apple
banana
cherryAs you can see, they work in pretty much the same way here. Both return basically the same output, and the syntax for using the appropriate functions for splitting the strings is virtually identical.
However, it’s once things get a bit more difficult that it becomes obvious how much more powerful regular expressions are. Say you’re dealing with a string that’s similar to that above, but may use different characters to separate the elements. For instance, say you’ve obtained the data that you want to pass through into an array from a text file and it’s somewhat inconsistent – perhaps the information you want is separated by differing amounts and types of whitespace, or different characters. The explode() function simply won’t handle that (at least, not without a lot of pain). But with Perl’s split() function, that’s no problem. Here’s how you might deal with input that had different types and quantities of whitespace as a separator:
@fruit = split(/\s+/,$fruit);
Yes, it’s that simple! The \s metacharacter matches any type of whitespace, and the + modifier means that it will match one or more times. Now you can very easily convert the contents of that string into an array.
Or say you want to convert an entire string of text, with all kinds of punctuation and whitespace, into an array, but only keep the actual words. This wouldn’t be practical with explode(), but with split() it’s easy:
@fruit = split(/\W+/,$fruit);
The \W metacharacter matches any non-word character (ie anything other than a-z, A-Z or 0-9), and again the + modifier means that it will match one or more times.
And of course, regular expressions are useful for many more tasks than this that, while possible with most language’s existing string functions, can get very nasty quite quickly. Say you want to match a UK postcode to check that it’s valid (note that for the sake of simplicity, I’m going to ignore BFPO and GIR postcodes). These use a format of one or two letters, followed by one digit, then may have an additional digit or letter, then a space, then a digit, then two letters. This would be a nightmare to check using most language’s native string functions, but with a regex in Perl, it’s relatively simple:
my $postcode = "NR1 1NP";
if($postcode =~ m/^[a-zA-Z]{1,2}\d{1}(|[a-zA-Z0-9]{1})(|\s+)\d{1}\w{2}$/)
{
print "It matched!\n";
}And if you wanted to return the first part of the postcode if it matched as well, that’s simple too:
my $postcode = "NR1 1NP";
if($postcode =~ s/^([a-zA-Z]{1,2}\d{1}(|[a-zA-Z0-9]{1}))(|\s+)\d{1}\w{2}$/$1/)
{
print "It matched! $postcode\n";
}
Now, you may say “But that’s in Perl! I’m using PHP!’. Well, regular expressions are an extremely powerful part of PHP that are very useful, they’re just not as central to the language as they are in Perl. PHP actually has two distinct types of regular expressions – POSIX-extended regular expressions, and Perl-compatible regular expressions (or PCRE). However, POSIX-extended regular expressions were deprecated from PHP 5.3 onwards, so it’s not really worth taking the time to learn them when PCRE will do exactly the same thing and is going to be around for the future. Furthermore, most other programming languages also support Perl-compatible regular expressions, so they’re fairly portable between languages, and once you’ve learned them in one language, you can easily use them in another. In other words, if you learn how to work with regular expressions in Perl, you can very easily transfer that knowledge to most other programming languages that support regular expressions.
In the first example given above, we can replace explode() with preg_split, and the syntax is virtually identical to split() in Perl, with the only difference being the name of the function and that the pattern to match is wrapped in double quotes:
$fruit = "apple,banana,cherry";
echo $fruit."\n";
$fruitArray = preg_split("/,/",$fruit);
foreach($fruitArray as $fruitArrayItem)
{
echo $fruitArrayItem."\n";
}
apple,banana,cherry
apple
banana
cherry
Along similar lines, if we want to check if a string matches a pattern, we can use preg_match(), and if we want to search and replace, we can use preg_replace(). PHP’s regular expression support is not appreciably poorer than Perl’s, even if it’s less central to the language as a whole.
But regular expressions are slower than PHP’s string functions!
Yes, that’s true. So it’s a mistake to use regular expressions for something that can be handled quickly and easily using string functions. For instance, if in the following string you wanted to replace the word “cow” with “sheep”:
The cow jumped over the moon
You could use something like this:
$text = "The cow jumped over the moon";
$text = preg_replace("/cow/","sheep",$text);However, because here you are only looking to match literal characters, you don’t need to use a regular expression. Just use the following:
$text = str_replace("cow","sheep",$text);
But, if you have to do some more complex pattern matching, you have to start using strpos to get the location of specific characters and returning substrings between those characters, and it gets very messy, very quickly indeed. In those cases, while I haven’t done any kind of benchmarking on it, it stands to reason that quite quickly you’ll reach a point where a regex would be faster.
However, for a number of common tasks, such as validating email addresses and URLs, there’s another way and you don’t need to resort to regular expressions, or faffing about with loads of string functions. The filter_var() function can be used for validating or sanitising email addresses and URLs, among other things, so this is worth using instead of writing a regex. If you’re using a framework such as CodeIgniter, you may have access to its native functions for validating this kind of thing, so you should use those instead.
But regular expressions are ugly and make for less readable code!
Not really. They seem intimidating to the newcomer, and very few people can just glance at a regex and instantly know what it does. But with regexes, you can often do complex things in far fewer lines of code than would be needed to accomplish the same thing using just PHP’s string functions. If you can do something in a line or two using string functions, it’s probably best to do that. But after that, things go downhill very quickly.
Once you learn them, regular expressions really are not that hard, and you’ll probably find enough things to use them for that you’ll get plenty of practice at them. They’re certainly more readable to anyone with even a modicum of experience using them than line after line of flow-control statements.
But you shouldn’t be using regular expressions for parsing HTML or XML!
Quite true. Regular expressions are the wrong tool for that. You should probably use an existing library of some kind for that.
Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.
Ah, yes, surely one of the most misused quotes on the web! Again, regular expressions are not the right tool for every job, and there’s a lot of tasks they get used for, and quite frankly, shouldn’t be. Most of us who know regular expressions have been known to use them for things we probably shouldn’t (I actually only just stumbled across filter_var, so I’ve done my share of validating email addresses using regexes, and I’m as guilty as anyone else of overusing them). But there’s still plenty of stuff you should use it for when what you need to do can’t be accomplished quickly and easily using string functions.
Regular expressions are not inherently evil. They’re a tool like any other. What is bad is using them for things where a simple alternative exists. However, they are still extremely useful, and there’s plenty of valid use cases for them.
Github
To date, Subversion is the single versioning system I have the most experience with. I use it at work, and I was already somewhat familiar with it beforehand. However, with all the buzz over Git over the last few years, it’s always been tempting to explore that as an alternative.
I’ve had a Github account for over a year, but had as yet not added anything to it. However, today that changed. I’ve had a rather haphazard approch towards my .vimrc and other Vim configuration files for a while, with the result that they tend to be less than consistent across different machines. I’ve seen that a fair number of people put their Vim configuration files under version control, and that seemed like an effective solution, so I’ve gotten my .vimrc and .vim into a respectable state and added them to a new repository. Now I should have no excuse for letting them get out of sync.
I have to say, Github is a truly wonderful service. The tutorials for getting started with Git are really good, and make it easy to get started. It’s probably one of the main reasons why Git is becoming more and more popular- there isn’t really anything comparable for Subversion.
What makes a good programming textbook?
I own a lot of programming textbooks. I went through a long phase of buying ones about virtually any technology I was even remotely interested in, therefore I own loads of books about Perl, Ruby, Python, PHP and C, among others. Granted, with many of them I’ve done little more than flick through them (I find it’s hard to get round to learning things like that without some kind of plan, which was what made me eventually start doing a more formal course since it forced a plan on me), but I’ve seen quite a few.
But of course, not every textbook is equal. Some are great, truly seminal works that are raved about by well-known programmers. Examples include the Camel book (Programming Perl) and K & R (The C Programming Language). Others are rarely mentioned. But what makes a really good textbook? Here I’m going to list some of the attributes that I’ve found in my favourite and most effective programming texts, and that I think make for a good, effective and informative textbook that makes a good job of getting you up and running programming in a new language:
- Lots of working examples to enter – To learn to program, whether from scratch or in a brand new language, the best advice I’ve ever heard was that you need to read a lot of code, and write a lot of code. I find that, at least at the start, nothing helps me learn to code in a new language better than lots of examples for me to type in and run, in order to pick up the basic syntax and keywords of the language. After all, that’s how many people used to learn BASIC, by typing in listings from magazines, and it’s how you learn English as a child – you’re exposed to the language, and you copy it, then understanding comes later. One of the best examples of this is C for Dummies, All In One Desktop Reference, by Dan Godkin – it’s packed full of loads of great example programs to enter and run that demonstrate the basic concepts well in C.
- Maintains your interest by showing you how to do interesting things – Not many people are interested in learning a new programming language to do something tedious (that said, if someone already has to do something tedious, such as a task at work, teaching them how to write a program to do it for them may well be considerably more interesting for them than doing the task themselves, hence the popularity of scripting languages for automating dull tasks), so a good programming textbook needs to show the learner how to do something interesting. Games are an obvious example, but they can get a bit much – how many different versions of Hangman do people want to create? Simple web apps are also an option with many programming languages. If something needs to be more utilitarian, then if possible it should be genuinely useful for solving a problem (the programmer doesn’t necessarily need to have this problem, they just need to see how to create a program to fix it). Frivolous little scripts that do things like recite “99 Bottles of Beer” to demonstrate for loops have their place, but that place is near the start only – by the end a programmer wants to be able to write useful programs.
- Good exercises to stretch the reader - Many textbooks will have additional exercises for the reader at the end of each chapter that allow them to practice their skills and ensure they aren’t just copying a listing, but are genuinely capable of writing code from scratch in the language. These are effectively the “homework” assignments, and I’ve found that these can be far more important at teaching me how to use the language well for actual programming projects than the listings within the book.
These are my thoughts, but I’d be interested to read what other people think about this issue. What’s the best programming textbook you’ve ever used, and why do you like it? What do you think a good programming textbook should have?
Perl after Python
I’m currently studying for the CIW Web Developer qualification, and having passed the exams for database design and JavaScript, I’m now on to the third component, Perl. I figured that having already picked up a reasonable grasp of another scripting language (namely Python), that I would have no trouble picking up Perl quickly, as happened when I learned JavaScript.
Unfortunately, it hasn’t quite worked out as well as I’d hoped so far, and in a number of ways. First of all, it doesn’t seem to “fit your brain” quite as easily as Python does – I find that the significant number of non-alphanumeric characters used makes it less intuitive than Python, at least for me. I’m also not a great fan of the syntax – in particular, I really am not keen on the syntax used for object-oriented programming. In general I’m finding it a struggle to pick up many things I learned quite quickly in Python.
That said, Perl has plenty of awesome features. CPAN has a staggering number of modules available, and makes it very easy to install them. And of course, its support for regular expressions is second to none. Don’t get me wrong, it’s a language I really want to know better and be able to use well, but I am finding it quite hard going compared to Python.
I strongly suspect, however, that it may well be, at least in part, because I learned Python first and my brain is used to the Pythonic way of doing things, therefore I’m having to unlearn those habits for Perl. Has anyone else learned Python first and then struggled to pick up Perl, or is it just me? Does learning Python first predispose you to finding Perl more difficult?
Staying productive in summer
Unless you live in the southern hemisphere, summer’s here. Right now it’s gone midnight but it’s still very hot, and being in the UK, where home air conditioning is not common, there’s little way to alleviate the heat besides opening the window (which you don’t want to do because people are continually having barbecues).
I have to admit to just not being a summer person – I don’t really like hot weather or outdoor pursuits in general. I’m happier in spring and autumn when it’s cool, but not cold, and I can wear my favourite Animal hoodie if it does turn chillier. When it gets hot I find it extremely difficult to get anything done unless I can do so in a fully air-conditioned environment, and have a lot of trouble sleeping at night, exacerbating the problem due to tiredness. Unfortunately, I’m now having to learn Perl from scratch during this time (a fairly daunting prospect at the best of times!), and it’s a bit of a nightmare trying to actually sit down and learn regular expressions properly when it’s hot and stuffy and you can’t think straight.
I’ll have to try going to a nice cool air-conditioned cafe with my Dell Mini, get a cold drink and see if I can get some work done that way. But I’m curious to know if anyone else has any good tips for remaining productive at learning a new programming language during hot weather that they’d like to pass on?
Why you should try Vim
I’m a huge fan of the Vim text editor. I have the key bindings burned so deep into my head I keep reaching for the Escape key at work when I want to move around in a document in Microsoft Word, or hitting J to try and move down. In short, I’m incurably hooked on this wonderfully powerful text editor, and if you’re still using something like gedit, TextEdit, or (god forbid!) Notepad, then I want to get you to think hard about switching to Vim!
I first started using Vim nearly two years ago. At the time, I had a fair grasp of HTML, but hadn’t really gotten into programming as such. I had my Eee PC 2G Surf with me most of the time, but didn’t have regular access to the Internet, and didn’t have a full-sized laptop available for much of the time. The main text editor I’d used to date was Kate in KDE3.5.
One day I decided that, for lack of anything else to do, I was going to run through the Vim tutorial (accessed by entering vimtutor) in a terminal on my Eee PC. It was weird to start with, but I soon got used to the unusual-seeming key bindings. As a touch-typist, Vim worked really well for me since it meant I didn’t have to move my hands off the keyboard at all, and the arcane-sounding keys soon became second nature. When I started work on the Site Development Foundations part of my CIW Foundation course, naturally I used Vim, and it worked well for both HTML and CSS documents, and of course the more I used it the more proficient at it I became.
Then when I first started learning Python, Vim really came into its own. The syntax highlighting is a real help, it’s extremely fast to move around in and edit a document, and the autocompletion, while perhaps not quite as good as that in a language-specific IDE, was good enough for most purposes. I’ve since used Vim for coding in HTML, CSS, Python, JavaScript and C, as well as editing configuration files in various Unix-like operating systems, and it’s been an excellent editor for all of these. I’ve barely scratched the surface of what it can do, and I already couldn’t imagine using anything else.
So, why should you use Vim? Here are just a few of the reasons.
Vim is everywhere
If you’re running Mac OS X, a CLI-only version of Vim is included, and you can get a graphical version called MacVim as well if you need it. Most Linux distributions include either Vim or another vi clone by default, and if not it’s available from your distribution’s repositories. If you’re running another Unix flavour, again you almost certainly have Vim or another vi clone, and if not you can get one. And if you’re on Windows you can grab a copy too. If you use a text editor like Kate, gedit, or so on, then you can’t guarantee you can get it on other platforms. With Vim you can.
Also, the fact that Vim is a CLI application means that even if you have to edit something via SSH or Telnet, you still have access to a text editor you know well and can work just as well as you would with a GUI.
Vim is flexible
If you’re using Vim, you can rely on it to edit files in virtually any programming or markup language you like, making it easy to adapt. Learning Ruby? You can do it in Vim. Now you want to learn Java? Again, Vim will do the job. By allowing you to use a familiar environment for virtually any programming language you may want to learn, Vim means you’ll be productive quicker in a new language than you would be if you had to use a different text editor to the one you’ve used before.
Vim is fast
If you know how much faster touch-typing is than hunt-and-peck typing, then you’ll have some idea of why Vim is faster than regular typing. Because Vim uses the home row for navigation, and in general is designed so you move your fingers as little as possible, it’s faster than just about any other text editor you can name. The key bindings are deceptively simple to remember for the most part, as it’s your fingers that need to remember them, not you brain.
Vim doesn’t require the use of a mouse to navigate, nor does it require you to move your hand to the cursor keys. It also allows you to jump through a document as many times as you want – for instance, to go down 9 lines, you just enter 9j. And it’s easy to search for specific words and navigate to them.
Vim is easy to customise
By editing your .vimrc configuration file, you can easily modify how Vim works for your own needs. You can easily change settings to suit your working habits better, such as setting it to work with a mouse, add line numbers, change the colour scheme for syntax highlighting, change the key bindings etc. Vim can also be extend by use of scripts and plugins. Once you have it set up the way you want, it’s easy to move all your settings to another computer.
Vim always has a way to make things easier
One of the best things about Vim is that, because it’s a solid, mature product, someone’s usually thought of a way to do whatever you want to do, quickly and easily. For instance, recently I was writing a JavaScript function to rate passwords for security, and I decided to get a list of bad passwords off the Web. I soon found one, but it was in excess of three thousand entries long, and I had to edit it to put them all into an array as individual strings, enclosed by quotes and separated by commas, ideally each on a line by themselves. This would have been an incredibly tedious task if I had to do it manually, so I did a little digging and discovered how to create a macro in Vim. This made it trivial to perform this task with only a few keypresses.
Vim is tightly integrated with the command line
Vim makes it easy to run other shell commands without leaving, by entering :!, followed by the command you want to run from the shell. This means it’s easy to run or compile a program you’ve just written without leaving Vim, so you don’t lose your place. When you’re done, you’re sent straight back into Vim, exactly where you left off.
Vim behaves like a GUI application
Yes, Vim may normally be a command-line application, but it still manages to pack in many of the niceties of graphical applications. You can split the screen horizontally or vertically, or open new files in new tabs. It’s even possible to use it with a mouse in most cases.
These are just a few of the reasons why I love Vim, and if you haven’t already tried it, or if you’re still using a less-powerful text editor, then I urge you to give it a go. Yes, the learning curve can be a bit steep, but it’s well worth it in terms of boosting your productivity. It works well for hand-coding HTML and CSS, or for programming in almost any language you can think of. You can get started today – if you’re using Linux or Mac OS X, it’s almost certainly already there waiting for you, and on Windows it’s just a download away. Try launching the tutorial by entering vimtutor in the shell, and work through it, and you’ll find yourself getting used to it surprisingly quickly. Or why not try Cream, essentially a preconfigured version of Vim that has a shallower learning curve? If you use a text editor at all, you really should give Vim a try.