Currently showing entries with the tag: help
|
page 1 of 1
|
.NET Interview Questions - Part 3
December 02, 2007 • 7:56PM • permalink
I received such an overwhelming response to my last two blog posts on .NET interview questions, that I decided to post a third.
Part 1 can be found here.
Part 2 can be found here.
Continuing from where we left off...
6. If placed in the Page_Load method of a ASP.NET page, what will the following code output?
Response.Write("<br />Before");
try
{
Response.Write("<br />In the 'try'");
int i = 0;
int j = 1 / i;
}
catch
{
Response.Write("<br />In the 'catch'");
Response.End();
return;
}
finally
{
Response.Write("<br />In the 'finally'");
}
Response.Write("<br />After");
Pretty simple question, right? Wrong!
I got it wrong the first time round too and even for the posting of this blog I made sure to execute the program and check the results!
You would see the following:
Before
In the 'try'
In the 'catch'
In the 'finally'
Remember that the finally clause will execute without exception (no pun intended). I tried to really drive that home by first executing Response.End, which even throws a second exception, and then executing a return function, in an attempt to leave the currently executing method.
Regardless of the return, the finally clause still executes before returning control to the return statement, preventing the display of the word "After".
7. Write a script to generate a dynamic image on a webpage, such as for use as a CAPTCHA, placing a watermark on an image or checking the referring url of a requested image?
For my example, I'll display 10 characters of randomly sized/styled/selected text in on a Red background. Note that I'm not going to introduce any warping, backgrounds or any other security features. This code is not intended for use as a real CAPTCHA and it would be trivial to write a OCR script to attack it.
I'm going to put the whole block of code without too much discussion. Most of the work is done by the GDI functions, which you can easily look up on MSDN. This would be placed in the OnLoad portion of a page and then called through a img object in the HTML like:
<img src="CaptchaImage.aspx" />
Note that we have previously defined the following helper structure to avoid repeated boxing/unboxing:
struct CaptchaCharacter
{
public char character;
public Font font;
}
The rest of the code follows:
int width = 600;
int height = 400;
int number_of_characters = 10;
string character_choices = "ABCDEFGHJKLMNPQRSTUVWXYZ23456789";
string[] font_families = { "Tahoma", "Arial", "Verdana" };
int[] font_sizes = { 36, 60, 84, 108 };
Rectangle bmp_rect = new Rectangle(0, 0, width, height);
Bitmap bmp = new Bitmap(width, height);
Graphics graphics = Graphics.FromImage(bmp);
graphics.SmoothingMode = SmoothingMode.AntiAlias;
graphics.FillRectangle(Brushes.Red, bmp_rect);
CaptchaCharacter[] character_array = new CaptchaCharacter[number_of_characters];
Random rnd = new Random();
for (int x = 0; x < number_of_characters; x++)
{
CaptchaCharacter new_char = new CaptchaCharacter();
new_char.character = character_choices[rnd.Next(0, character_choices.Length)];
new_char.font = new Font(font_families[rnd.Next(0, font_families.Length)],
font_sizes[rnd.Next(0, font_sizes.Length)]);
character_array[x] = new_char;
}
StringFormat format = new StringFormat();
format.Alignment = StringAlignment.Center;
format.LineAlignment = StringAlignment.Center;
GraphicsPath path = new GraphicsPath();
for (int a = 0; a < number_of_characters; a++)
{
RectangleF rect = new RectangleF((width / number_of_characters) * a,
0,
width / number_of_characters,
height);
path.AddString(character_array[a].character.ToString(),
character_array[a].font.FontFamily,
0,
character_array[a].font.SizeInPoints,
rect,
format);
}
graphics.FillPath(Brushes.Black, path);
Response.ContentType = "image/gif";
bmp.Save(Response.OutputStream, ImageFormat.Gif);
for (int z = 0; z < character_array.Length; z++)
character_array[z].font.Dispose();
path.Dispose();
graphics.Dispose();
bmp.Dispose();
First a Bitmap object is created, which is what we will eventually output. After obtaining a reference to it's GDI Graphics object, we begin drawing on it. First a background rectangle with a Red brush is drawn and then a GraphicsPath object is created. We can use the built-in AddString method of the GraphicsPath to easily style and add our characters. We could have easily output the whole string at once, but we loop through each character to apply individual styling of FontFamily and font size to each character. Finally, we change the ResponseType of the our encapsulating page and save the bitmap to the built-in OutputStream (which will block all other output to the page).
Lately, I've seen a lot of really bad SQL come through the office on interviews. In our extensive interview process, many of the other developers focus on simple SQL problems, which is really all that is necessary for the day-to-day job at Demand.
Unlike some of the other developers, my boss constantly chastises me for worrying about security too much. I can't deny that I do obsess about security too much, given my background, but because of that I'll occasionally ask the following question, which I think any SQL developer should be able to answer:
8. Given a simple login box (with username and password fields), what input will compromise the database in a susceptible system?
I'll even go so far as to show you the poorly written code that will allow this... (Note that the code is looking for the password of the given user and will check it in C# code below, that's all it takes to allow an exploit).
string sql = string.Format(@"
SELECT
password
FROM [dbo].[Accounts]
WHERE username='{0}' ", Request.Form["username"]);
DataTable dt = new DataTable();
SqlConnection connection = new SqlConnection(connection_string);
SqlCommand command = new SqlCommand(sql, connection);
command.CommandType = CommandType.Text;
connection.Open();
SqlDataReader sdr = command.ExecuteReader(CommandBehavior.CloseConnection);
dt.Load(sdr);
sdr.Close(); //this will close the connection too
if (dt.Rows.Count > 0)
if (dt.Rows[0]["password"].ToString() == Request.Form["password"])
LoginUser();
First, the exploit. There are an infinite number of things you can do with a SQL Injection, but we'll use the simple input:
' AND 0=1 UNION SELECT '123456' -- in the username field and 123456 in the password field.
This turns the executed query into:
SELECT
password
FROM [dbo].[Accounts]
WHERE username='' AND 0=1 UNION SELECT '123456' --'
First, you'll note that the -- placed at the end will comment out the original query ending, including the single-quote. The end result has the WHERE-clause being interpreted as username='' AND 0=1. Obviously, the AND 0=1 portion will cause the entire clause to return FALSE. At this point, we UNION a literal '123456', which will allow us access to the site. (Note that this is a very simple example, in most cases you would most likely be selecting back the matching user account and hence could theoretically login to any account.)
Some may argue that I made the impossible possible by revealing the original source code, but that's not necessarily true. For anyone that's attempting a SQL-injection, it's most likely not a large leap to write a script to brute force the parameters of the victim query. At that point, you can literally do whatever you want by using a little ingenuity and the INFORMATION_SCHEMA object, supported by most RDMS.
.NET Quickies
* Using a method of the String object, what is the optimized .NET way of performing the (often executed) compound conditional:
if (some_string != null && some_string != "")
DoSomething();
String.IsNullOrEmpty()
(in my tests for this blog entry, it consistently performed 40-45% faster)
* When encoding data, what is the key overall difference between hashing and encrypting?
Hashing is a one-way mapping, while encryption has a corresponding decryption which will reverse the process.
* What is the effect of making a method of a class static and what might it's use be?
Static methods are not associated with any one instance of the class, nor are they able to access any instance fields of a class. Thus, instead of invoking the methods through an instance call, you use the name of the class instead (since you are referencing the single Type object of that class maintained by .NET), like so:
string s = "some test string";
bool starts_with_some = s.StartsWith("some");
bool not_null_or_empty = string.IsNullOrEmpty(s);
Static methods allow you to provide stand-alone methods that relate to a classes functionality. Another example might be a Country class. I might use it to represent a single country object, with fields/properties like CountryID, Name or ZipCodeList. I might also include a method to use the current class' data like GetIPRange() or FindContinent(). Finally, I could also add stand-alone (static) methods, like Country.GetAllCountries() to return a List containing the name of every country on Earth.
I want to add the additional note that since I've been seeing an increase in the number of "demand media" interview questions Google searches hit my blog, we have been working on restructuring our interview process to change the questions around and are now working towards a much more hands-on interview. Note that part of the review process includes reviewing my blog for any questions and removing them (or limiting the use of them) from our interview process. So make sure you know how to use .NET in ways outside the scope of these questions.

I also want to encourage people to continue contacting me with your questions and comments. As long as there is an interest in the topic, I will continue to present real-life .NET interview questions.
Part 1 can be found here.
Part 2 can be found here.
Continuing from where we left off...
6. If placed in the Page_Load method of a ASP.NET page, what will the following code output?
Response.Write("<br />Before");
try
{
Response.Write("<br />In the 'try'");
int i = 0;
int j = 1 / i;
}
catch
{
Response.Write("<br />In the 'catch'");
Response.End();
return;
}
finally
{
Response.Write("<br />In the 'finally'");
}
Response.Write("<br />After");
Pretty simple question, right? Wrong!
I got it wrong the first time round too and even for the posting of this blog I made sure to execute the program and check the results!
You would see the following:
Before
In the 'try'
In the 'catch'
In the 'finally'
Remember that the finally clause will execute without exception (no pun intended). I tried to really drive that home by first executing Response.End, which even throws a second exception, and then executing a return function, in an attempt to leave the currently executing method.
Regardless of the return, the finally clause still executes before returning control to the return statement, preventing the display of the word "After".
7. Write a script to generate a dynamic image on a webpage, such as for use as a CAPTCHA, placing a watermark on an image or checking the referring url of a requested image?
For my example, I'll display 10 characters of randomly sized/styled/selected text in on a Red background. Note that I'm not going to introduce any warping, backgrounds or any other security features. This code is not intended for use as a real CAPTCHA and it would be trivial to write a OCR script to attack it.
I'm going to put the whole block of code without too much discussion. Most of the work is done by the GDI functions, which you can easily look up on MSDN. This would be placed in the OnLoad portion of a page and then called through a img object in the HTML like:
<img src="CaptchaImage.aspx" />
Note that we have previously defined the following helper structure to avoid repeated boxing/unboxing:
struct CaptchaCharacter
{
public char character;
public Font font;
}
The rest of the code follows:
int width = 600;
int height = 400;
int number_of_characters = 10;
string character_choices = "ABCDEFGHJKLMNPQRSTUVWXYZ23456789";
//NUMBERS 0+1, LETTERS I+O removed for legibility reasons
string[] font_families = { "Tahoma", "Arial", "Verdana" };
int[] font_sizes = { 36, 60, 84, 108 };
Rectangle bmp_rect = new Rectangle(0, 0, width, height);
Bitmap bmp = new Bitmap(width, height);
Graphics graphics = Graphics.FromImage(bmp);
graphics.SmoothingMode = SmoothingMode.AntiAlias;
graphics.FillRectangle(Brushes.Red, bmp_rect);
CaptchaCharacter[] character_array = new CaptchaCharacter[number_of_characters];
Random rnd = new Random();
for (int x = 0; x < number_of_characters; x++)
{
CaptchaCharacter new_char = new CaptchaCharacter();
new_char.character = character_choices[rnd.Next(0, character_choices.Length)];
new_char.font = new Font(font_families[rnd.Next(0, font_families.Length)],
font_sizes[rnd.Next(0, font_sizes.Length)]);
character_array[x] = new_char;
}
StringFormat format = new StringFormat();
format.Alignment = StringAlignment.Center;
format.LineAlignment = StringAlignment.Center;
GraphicsPath path = new GraphicsPath();
for (int a = 0; a < number_of_characters; a++)
{
RectangleF rect = new RectangleF((width / number_of_characters) * a,
0,
width / number_of_characters,
height);
path.AddString(character_array[a].character.ToString(),
character_array[a].font.FontFamily,
0,
character_array[a].font.SizeInPoints,
rect,
format);
}
graphics.FillPath(Brushes.Black, path);
Response.ContentType = "image/gif";
bmp.Save(Response.OutputStream, ImageFormat.Gif);
//we dispose all the Graphics objects
for (int z = 0; z < character_array.Length; z++)
character_array[z].font.Dispose();
path.Dispose();
graphics.Dispose();
bmp.Dispose();
First a Bitmap object is created, which is what we will eventually output. After obtaining a reference to it's GDI Graphics object, we begin drawing on it. First a background rectangle with a Red brush is drawn and then a GraphicsPath object is created. We can use the built-in AddString method of the GraphicsPath to easily style and add our characters. We could have easily output the whole string at once, but we loop through each character to apply individual styling of FontFamily and font size to each character. Finally, we change the ResponseType of the our encapsulating page and save the bitmap to the built-in OutputStream (which will block all other output to the page).
Lately, I've seen a lot of really bad SQL come through the office on interviews. In our extensive interview process, many of the other developers focus on simple SQL problems, which is really all that is necessary for the day-to-day job at Demand.
Unlike some of the other developers, my boss constantly chastises me for worrying about security too much. I can't deny that I do obsess about security too much, given my background, but because of that I'll occasionally ask the following question, which I think any SQL developer should be able to answer:
8. Given a simple login box (with username and password fields), what input will compromise the database in a susceptible system?
I'll even go so far as to show you the poorly written code that will allow this... (Note that the code is looking for the password of the given user and will check it in C# code below, that's all it takes to allow an exploit).
string sql = string.Format(@"
SELECT
password
FROM [dbo].[Accounts]
WHERE username='{0}' ", Request.Form["username"]);
DataTable dt = new DataTable();
SqlConnection connection = new SqlConnection(connection_string);
SqlCommand command = new SqlCommand(sql, connection);
command.CommandType = CommandType.Text;
connection.Open();
SqlDataReader sdr = command.ExecuteReader(CommandBehavior.CloseConnection);
dt.Load(sdr);
sdr.Close(); //this will close the connection too
if (dt.Rows.Count > 0)
if (dt.Rows[0]["password"].ToString() == Request.Form["password"])
LoginUser();
First, the exploit. There are an infinite number of things you can do with a SQL Injection, but we'll use the simple input:
' AND 0=1 UNION SELECT '123456' -- in the username field and 123456 in the password field.
This turns the executed query into:
SELECT
password
FROM [dbo].[Accounts]
WHERE username='' AND 0=1 UNION SELECT '123456' --'
First, you'll note that the -- placed at the end will comment out the original query ending, including the single-quote. The end result has the WHERE-clause being interpreted as username='' AND 0=1. Obviously, the AND 0=1 portion will cause the entire clause to return FALSE. At this point, we UNION a literal '123456', which will allow us access to the site. (Note that this is a very simple example, in most cases you would most likely be selecting back the matching user account and hence could theoretically login to any account.)
Some may argue that I made the impossible possible by revealing the original source code, but that's not necessarily true. For anyone that's attempting a SQL-injection, it's most likely not a large leap to write a script to brute force the parameters of the victim query. At that point, you can literally do whatever you want by using a little ingenuity and the INFORMATION_SCHEMA object, supported by most RDMS.
.NET Quickies
* Using a method of the String object, what is the optimized .NET way of performing the (often executed) compound conditional:
if (some_string != null && some_string != "")
DoSomething();
String.IsNullOrEmpty()
(in my tests for this blog entry, it consistently performed 40-45% faster)
* When encoding data, what is the key overall difference between hashing and encrypting?
Hashing is a one-way mapping, while encryption has a corresponding decryption which will reverse the process.
* What is the effect of making a method of a class static and what might it's use be?
Static methods are not associated with any one instance of the class, nor are they able to access any instance fields of a class. Thus, instead of invoking the methods through an instance call, you use the name of the class instead (since you are referencing the single Type object of that class maintained by .NET), like so:
string s = "some test string";
bool starts_with_some = s.StartsWith("some");
//StartsWith uses the instance s
bool not_null_or_empty = string.IsNullOrEmpty(s);
//IsNullOrEmpty is a static method
Static methods allow you to provide stand-alone methods that relate to a classes functionality. Another example might be a Country class. I might use it to represent a single country object, with fields/properties like CountryID, Name or ZipCodeList. I might also include a method to use the current class' data like GetIPRange() or FindContinent(). Finally, I could also add stand-alone (static) methods, like Country.GetAllCountries() to return a List
I want to add the additional note that since I've been seeing an increase in the number of "demand media" interview questions Google searches hit my blog, we have been working on restructuring our interview process to change the questions around and are now working towards a much more hands-on interview. Note that part of the review process includes reviewing my blog for any questions and removing them (or limiting the use of them) from our interview process. So make sure you know how to use .NET in ways outside the scope of these questions.
I also want to encourage people to continue contacting me with your questions and comments. As long as there is an interest in the topic, I will continue to present real-life .NET interview questions.
0 comments
Reflection on ASP.NET Auto-Compiled Classes
October 12, 2007 • 8:16AM • permalink
I came across a unique situation yesterday that took awhile to figure out, but I thought it was a really cool concept!
The basic idea is that I have an ASP.NET website that references a DLL. The DLL contains an interface that other classes can implement, with the general idea of allowing external classes (external to the DLL) to act as "plug-ins". The logical location to place these classes is in the App_Code folder, since it will auto-compile the classes and make them available globally, but that's when I ran into a problem...
The DLL also contains a static class to populate a static collection of the classes, so that they can be referenced by name. Since the classes act as "plug-ins", they should be able to be modified at any time, as well as allow for new classes to be dropped into the App_Code folder. The only way to deal with a situation like this is with Reflection.
So, I included a reference to System.Reflection and tried loading the type information for one using Type.GetType(). That failed miserably as the return value was null. I thought for a minute and then wrapped the class placed in App_Code in a unique namespace. I went back to my call to Type.GetType() and tried referencing the class using this namespace. Again, a NullReferenceException.
How did I get around this issue? The solution is actually VERY VERY simple! You just need to get a reference to the Assembly that the App_Code folder gets compiled into by using a call to Assembly.Load("App_Code"). After that, you can use the returned assembly reference in order to get the class type. So, if I have a class named AdamWidget in my App_Code folder that implements the IWidget interface from my DLL. The code in the foreign DLL to load a Type instance for that class could be:
Assembly asm = Assembly.Load("App_Code");
Type module_type = asm.GetType("AdamWidget");
if (module_type.GetInterface("IWidget") != null)
{
DoSomething();
}
That's all there is to it! Now our web application can import (through the DLL) any class that implements IWidget in our App_Code folder!
Please also note that I discovered later that you can also reference the same dynamic assembly with a call to Assembly.Load("__code").
The basic idea is that I have an ASP.NET website that references a DLL. The DLL contains an interface that other classes can implement, with the general idea of allowing external classes (external to the DLL) to act as "plug-ins". The logical location to place these classes is in the App_Code folder, since it will auto-compile the classes and make them available globally, but that's when I ran into a problem...
The DLL also contains a static class to populate a static collection of the classes, so that they can be referenced by name. Since the classes act as "plug-ins", they should be able to be modified at any time, as well as allow for new classes to be dropped into the App_Code folder. The only way to deal with a situation like this is with Reflection.
So, I included a reference to System.Reflection and tried loading the type information for one using Type.GetType(). That failed miserably as the return value was null. I thought for a minute and then wrapped the class placed in App_Code in a unique namespace. I went back to my call to Type.GetType() and tried referencing the class using this namespace. Again, a NullReferenceException.
How did I get around this issue? The solution is actually VERY VERY simple! You just need to get a reference to the Assembly that the App_Code folder gets compiled into by using a call to Assembly.Load("App_Code"). After that, you can use the returned assembly reference in order to get the class type. So, if I have a class named AdamWidget in my App_Code folder that implements the IWidget interface from my DLL. The code in the foreign DLL to load a Type instance for that class could be:
Assembly asm = Assembly.Load("App_Code");
Type module_type = asm.GetType("AdamWidget");
if (module_type.GetInterface("IWidget") != null)
{
DoSomething();
}
That's all there is to it! Now our web application can import (through the DLL) any class that implements IWidget in our App_Code folder!
Please also note that I discovered later that you can also reference the same dynamic assembly with a call to Assembly.Load("__code").
Basic Regular Expressions
September 30, 2007 • 3:13PM • permalink
Originally, I considered Regular Expressions to be a bonus skill. Something that was nice if developers had it, but not a necessity. Recently it seems that things would go a lot smoother for me if the people around me knew RegEx, but I've been surprised by how few people do. It might have something to do with there being only one decent book (that I've seen) on the subject.
So, I decided to write this small tutorial to give a basic description of RegEx. I will be using .NET for the examples, but the patterns themselves should be valid in most Regular Expression implementations.
For some of our examples below we will require a larger block of example text, I'm going to use a paragraph of some random Lorem Ipsum text, with some random punctuation thrown in:
string lipsum = "Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Duis non nulla id sapien molestie pulvinar. 'Nulla vitae risus vel quam imperdiet egestas!' Vestibulum fringilla consequat pede. Quisque tortor lectus, rhoncus ut, posuere vel, rhoncus in, tellus. Fusce mi. Curabitur eget augue sit amet lorem iaculis sagittis. Nam et massa. Nunc sagittis, libero et eleifend aliquet, mi sem varius orci, sit amet sagittis turpis est nec dolor? Nulla facilisi. Proin volutpat erat a sem. Maecenas nibh libero, euismod at, consequat quis, rutrum in, turpis. Aenean erat enim, fermentum a, luctus non, bibendum in, tellus. In ac libero. Suspendisse potenti. Pellentesque tincidunt dignissim mi.";
First, a simple overview of some of the simpler RegEx constructs:
Literal Text
It's simple. Literal text matches exactly. If text is not modified by Regular Expression constructs (punctuation), then it should be taken literally. So a search for the Regex pattern adam will look for my name in a block of text.
[] - Character Classes and Ranges
Character classes are used to indicate exact character matches. For example, if I want to match all the vowels in the above text I can use:
[aeiou]
This will match exactly one character (a vowel) in any block of text that it is matched against. I'll use this pattern to illustrate the simplest way to collect all matches in a RegEx expression.
using System.Text.RegularExpressions;
RegexOptions options = RegexOptions.IgnorePatternWhitespace;
options |= RegexOptions.IgnoreCase;
Regex regex = new Regex("[aeiou]", options);
MatchCollection mc = regex.Matches(lipsum);
(Please note that to conserve space, I won't repeat the calls to add the RegexOptions, but you can assume that IgnorePatternWhitespace and RegexOptions.IgnoreCase were used in each example.)
After the snippet is run, the MatchCollection object, mc, holds all the vowel matches in the Lorem ipsum text (o, e, i, u, o, o, i and so on). Note that without the RegexOptions.IgnoreCase option, our pattern would need to be [AEIOUaeiou] in order to match the uppercase letters as well.
Character classes can also be used to contain ranges. To match against any letter in the alphabet, the following character class can be used:
[A-Za-z]
When matched against the lipsum text, this will match once for each single letter. Some other example character classes are:
[0-9] Any number
[a-ep-z] Any letter between 'a' and 'e' (inclusive) or between 'p' and 'z' (inclusive)
[A-Za-z0-9] Any letter or number
Note that the above will literally match one single character each time. If we need to match more characters, we can use additional aspects of Regex to indicate that.
First, we will look at the * modifier. This indicates that the preceding match component should be matched zero or more times. (The zero is important because it means that empty strings will match as well.) For example:
The regular expression [a-z]* will match all of the following:
a
d
adam
lorem
abcdefghijklmnopqrstuvwxyz
We can change the * into a + to match the preceding component one or more times. That is the difference between the * and the +. The * can have 0 matches and still satisfy the regular expression, while the + requires at least one physical match to be considered a valid match. For example, in the lipsum text above, the pattern [0-9]* would match, but [0-9]+ wouldn't, since the former is matched by empty strings. It is also important to note thatm by default, regular expressions are greedy and try to match as many characters as possible. Both the * and the + will include as many characters in their matches as they can.
Additionally, we can also use the ? which will match zero or one of the preceding pattern, essentially making it an "optional match". So:
L?orem?
will match any of the following:
Lorem
Bored
ore
Additional Modifiers
^ (Caret)
This modifier can be used in two very different ways. The first way is at the beginning of the inside of a character class. If present, the caret negates the meaning of the character class and instead matches ANY character except those inside of the brackets.
If we match the above lipsum text against the pattern [^sjdhflo ]+ (note that it includes the space) we are matching against one or more characters in a row that are not s, j, d, h, f, l, o or space.
If we actually match the above pattern we see many results (239 in all), such as rem, ip, um and so on.
^ (Caret) Part 2
The ^ can also be used outside of character classes, but only at the very beginning of a RegEx pattern. When present, it anchors the pattern to the start of a line of text. This is either the start of a string or after each hard line break. So, in the above lipsum text, the pattern ^lorem would only return one match, even though the word appears in the text twice. (Note that because the above does not have any hard line breaks - only soft line breaks caused by the formatting of the webpage - the ^ will only match at the beginning of the entire block of text.)
$
The $ is used only at the end of a RegEx pattern, in a use that is opposite that of the ^ shown above. The $ indicates that pattern is anchored to the end of a line of text. See the following example:
string s = "Sally sells seashells by the sea";
Regex regex = new Regex("sea$");
MatchCollection mc = regex.Matches(s);
In the above code, mc will only contain one match. This will be the one at the end, since it has the $. If we remove the $, it will instead have two matches. Like so:
string s = "Sally sells seashells by the sea";
Regex regex = new Regex("sea");
MatchCollection mc = regex.Matches(s);
Note that if we change the original string to have proper punctuation at the end (the period) the original sea$ pattern will not match at all.
string s = "Sally sells seashells by the sea.";
Regex regex = new Regex("sea$");
MatchCollection mc = regex.Matches(s);
This is because the end of pattern is looking for the word sea at the end of the block of text. Because the period is there and we're not looking for it, our pattern won't match!
There are a number of ways to fix this problem, but a general punctuation match will do the trick.
string s = "Sally sells seashells by the sea.";
Regex regex = new Regex("sea[!?.,-]$", options);
MatchCollection mc = regex.Matches(s);
Note the placement of the hyphen (-) at the end of the character class so that it isn't accidentally misinterpreted as a range of characters.
Note also the caveat that although the period outside of the character class WOULD match, it wouldn't do exactly what you think. Can you guess what the following example will print?
string s = "Sally sells seashells by the seaX";
Regex regex = new Regex("sea.$", options);
MatchCollection mc = regex.Matches(s);
if (mc.Count == 0)
Response.Write("0 matches!");
else if (mc.Count == 1)
Response.Write("1 match!");
If you guessed "1 match!", you're correct - but do you know why? The answer lies in our next modifier.
. (Period)
The . is used to represent ANY character, but at least one character must be in the matching position. It's really just a placeholder to say that "something" has to be in the position indicated.
In the example given above, the X fills the position that the . is in, so we get our "1 match!".
Escaped Characters and Shortcuts
The \ backslash is an extremely powerful character in RegEx pattern matching with a plethora of uses. First, it is used to escape special characters to be taken as literals within Regular Expression patterns. For example, if we were to search for text within parenthesis, we would need to escape the parenthesis, since they are used within RegEx patterns to form groups, as I'll explain below.
To search for the word Trunks in parenthesis the pattern would be:
\(Trunks\)
Please note in the following example which shows a common problem for beginners using RegEx in .NET:
string s = "This sentence is about my cat (Trunks).";
Regex regex = new Regex("\\(Trunks\\)");
MatchCollection mc = regex.Matches(s);
Note that in the above code I used two backslashes before the parenthesis. Can you figure out why? Here's an alternative way of writing it which might give you a hint:
Regex regex = new Regex(@"\(Trunks\)");
In .NET, as well as many other languages, characters within strings can be escaped by the backslash character. \n is the newline, \t is the tab, \r is the carriage return and so on. The .NET string that you're using for the RegEx pattern is first interpreted by the .NET parser, so any escaped characters will already have been escaped by the time the RegEx parser takes control and the pattern will not match correctly.
In order to get around that problem, we first escape the backslash itself, by using a \\ construct. As mentioned above (and below) the parenthesis is used as a special character in RegEx and needs to be escaped if you intend to use it as a literal.
In the pattern \\(Trunks\\), the backslash is first escaped as a .NET string so that the RegEx parser actually sees \(Trunks\). This escapes the parenthesis in the RegEx parsing and correctly finds the parenthesis. As an additional note, if you're unclear as to why the @"\(Trunks\)" works, it is a C# verbatim string. In a verbatim string, all characters are automatically escaped (except for the double-quote character, which you represent by doubling up "") and you can include formatting like tabs and newlines.
Please remember, this is for our .NET example and won't apply to all languages. If you're not using .NET, check with your language reference to see if you need to escape the backslashes.
RegEx Escaped Characters
RegEx has its own escape strings that are mostly used as shortcuts for a large ranges of characters. Like many other languages, you escape RegEx characters by using a backslash.
Note that in .NET you need to take the above caveat into considering when using these, so, for example, to search for a literal backslash, you would need to escape it in both .NET and RegEx: \\\\. This is because first .NET will escape it into \\, then RegEx will escape it into \ and search for the literal value.
Here are a few examples:
\d The same thing as [0-9].
\w The same thing as [a-zA-Z_0-9].
\s This will match against any whitespace.
The capital versions offer negations...
\D The same thing as [^0-9].
\W The same thing as [^a-zA-Z_0-9].
\S This will match against any character EXCEPT whitespace.
There are many other, less used, escaped characters. Be sure to check with your RegEx implementation's documentation (click here for the .NET resource.)
Note again that the backslash should be double-escaped when necessary.
Grouping
In the teaching of Regular Expressions, Grouping is usually considered an advanced topic and not taught in a first lesson. Personally, I don't see the point of learning how to use RegEx unless you can use it!
A group is automatically formed everytime matching pairs of parenthesis are used in a pattern. Each match has an automatic group of the entire match and then each subsequent parenthesis pair, as shown in the following example:
string s = "Mississippi";
Regex regex = new Regex("([aeiou][s]+)");
MatchCollection mc = regex.Matches(s);
The MatchCollection.Count property would be 2, indicating that the pattern matched twice (on iss both times, since we're matching any vowel followed by the letter s one or more times). If we examine the MatchCollection, we see it is made of Match objects, with a Groups property, another collection.
In all matches, the first Group (mc[0].Groups[0]) always contains the entire match, in this case iss. Actually, in this case, the second Group (mc[0].Groups[1]) also contains iss. This is because we're grouping the entire match and the results of our match was iss. You would see the exact same results in the second Match object since iss appears twice in the word Mississippi. Hence both mc[1].Groups[0] and mc[1].Groups[1] would contain the string iss.
If we change the parenthesis slightly and only group the vowel:
string s = "Mississippi";
Regex regex = new Regex("([aeiou])[s]+");
MatchCollection mc = regex.Matches(s);
The MatchCollection.Count property would still be 2 with the same exact matches. Also, the Groups[0] property would still contain iss, since that's our entire match. However, the Groups[1].Value would now contain only the i, since that's the entire match inside our parenthesis.
We'll change it slightly one more time:
string s = "Mississippi";
Regex regex = new Regex("([aeiou])(s)s");
MatchCollection mc = regex.Matches(s);
Our pattern is trying to match a vowel followed by two letter s characters. We are grouping the vowel by itself and additionally grouping the first of two s characters.
When we run the above code we get two matches, like we expect to. Each one matches against iss. If we examine the Groups collection, we see that Groups[1] contains the i and Groups[2] contains the first s.
The Groups property is also helpful if you want to modify text in the original string that you've located using a Regular Expression.
In the below example, I'm going to very simple replace every instance of the pattern r[aeiou]+m with the word BLAH. The pattern will match one or more vowels between the letters r and m. (If you're unclear as to why, re-read the sections above!)
string l = lipsum;
Regex regex = new Regex("r[aeiou]+m");
Match m = regex.Match(lipsum);
while (m.Success)
{
l = l.Substring(0, m.Index) + "BLAH" + l.Substring(m.Index + m.Value.Length);
m = regex.Match(l);
}
We keep looping as long as a match is found and use the Index property of the Match object to determine where in the original string our match was found. We remove the match and replace it with BLAH.
The resulting string is:
LoBLAH ipsum dolor sit amet, consectetuer adipiscing elit. Duis non nulla id sapien molestie pulvinar. 'Nulla vitae risus vel quam imperdiet egestas!' Vestibulum fringilla consequat pede. Quisque tortor lectus, rhoncus ut, posuere vel, rhoncus in, tellus. Fusce mi. Curabitur eget augue sit amet loBLAH iaculis sagittis. Nam et massa. Nunc sagittis, libero et eleifend aliquet, mi sem varius orci, sit amet sagittis turpis est nec dolor? Nulla facilisi. Proin volutpat erat a sem. Maecenas nibh libero, euismod at, consequat quis, rutBLAH in, turpis. Aenean erat enim, fermentum a, luctus non, bibendum in, tellus. In ac libero. Suspendisse potenti. Pellentesque tincidunt dignissim mi.
Which you can compare with the original, here:
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Duis non nulla id sapien molestie pulvinar. 'Nulla vitae risus vel quam imperdiet egestas!' Vestibulum fringilla consequat pede. Quisque tortor lectus, rhoncus ut, posuere vel, rhoncus in, tellus. Fusce mi. Curabitur eget augue sit amet lorem iaculis sagittis. Nam et massa. Nunc sagittis, libero et eleifend aliquet, mi sem varius orci, sit amet sagittis turpis est nec dolor? Nulla facilisi. Proin volutpat erat a sem. Maecenas nibh libero, euismod at, consequat quis, rutrum in, turpis. Aenean erat enim, fermentum a, luctus non, bibendum in, tellus. In ac libero. Suspendisse potenti. Pellentesque tincidunt dignissim mi.
That's quite a lot to take in for one entry on RegEx and should definitely get any novices a giant step closer to Regular Expression mastery! Look for an advanced discussion (including some .NET specific RegEx constructs) in the future.
So, I decided to write this small tutorial to give a basic description of RegEx. I will be using .NET for the examples, but the patterns themselves should be valid in most Regular Expression implementations.
For some of our examples below we will require a larger block of example text, I'm going to use a paragraph of some random Lorem Ipsum text, with some random punctuation thrown in:
string lipsum = "Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Duis non nulla id sapien molestie pulvinar. 'Nulla vitae risus vel quam imperdiet egestas!' Vestibulum fringilla consequat pede. Quisque tortor lectus, rhoncus ut, posuere vel, rhoncus in, tellus. Fusce mi. Curabitur eget augue sit amet lorem iaculis sagittis. Nam et massa. Nunc sagittis, libero et eleifend aliquet, mi sem varius orci, sit amet sagittis turpis est nec dolor? Nulla facilisi. Proin volutpat erat a sem. Maecenas nibh libero, euismod at, consequat quis, rutrum in, turpis. Aenean erat enim, fermentum a, luctus non, bibendum in, tellus. In ac libero. Suspendisse potenti. Pellentesque tincidunt dignissim mi.";
First, a simple overview of some of the simpler RegEx constructs:
Literal Text
It's simple. Literal text matches exactly. If text is not modified by Regular Expression constructs (punctuation), then it should be taken literally. So a search for the Regex pattern adam will look for my name in a block of text.
[] - Character Classes and Ranges
Character classes are used to indicate exact character matches. For example, if I want to match all the vowels in the above text I can use:
[aeiou]
This will match exactly one character (a vowel) in any block of text that it is matched against. I'll use this pattern to illustrate the simplest way to collect all matches in a RegEx expression.
using System.Text.RegularExpressions;
//this is needed at the very top
RegexOptions options = RegexOptions.IgnorePatternWhitespace;
options |= RegexOptions.IgnoreCase;
Regex regex = new Regex("[aeiou]", options);
MatchCollection mc = regex.Matches(lipsum);
(Please note that to conserve space, I won't repeat the calls to add the RegexOptions, but you can assume that IgnorePatternWhitespace and RegexOptions.IgnoreCase were used in each example.)
After the snippet is run, the MatchCollection object, mc, holds all the vowel matches in the Lorem ipsum text (o, e, i, u, o, o, i and so on). Note that without the RegexOptions.IgnoreCase option, our pattern would need to be [AEIOUaeiou] in order to match the uppercase letters as well.
Character classes can also be used to contain ranges. To match against any letter in the alphabet, the following character class can be used:
[A-Za-z]
When matched against the lipsum text, this will match once for each single letter. Some other example character classes are:
[0-9] Any number
[a-ep-z] Any letter between 'a' and 'e' (inclusive) or between 'p' and 'z' (inclusive)
[A-Za-z0-9] Any letter or number
Note that the above will literally match one single character each time. If we need to match more characters, we can use additional aspects of Regex to indicate that.
First, we will look at the * modifier. This indicates that the preceding match component should be matched zero or more times. (The zero is important because it means that empty strings will match as well.) For example:
The regular expression [a-z]* will match all of the following:
a
d
adam
lorem
abcdefghijklmnopqrstuvwxyz
We can change the * into a + to match the preceding component one or more times. That is the difference between the * and the +. The * can have 0 matches and still satisfy the regular expression, while the + requires at least one physical match to be considered a valid match. For example, in the lipsum text above, the pattern [0-9]* would match, but [0-9]+ wouldn't, since the former is matched by empty strings. It is also important to note thatm by default, regular expressions are greedy and try to match as many characters as possible. Both the * and the + will include as many characters in their matches as they can.
Additionally, we can also use the ? which will match zero or one of the preceding pattern, essentially making it an "optional match". So:
L?orem?
will match any of the following:
Lorem
Bored
ore
Additional Modifiers
^ (Caret)
This modifier can be used in two very different ways. The first way is at the beginning of the inside of a character class. If present, the caret negates the meaning of the character class and instead matches ANY character except those inside of the brackets.
If we match the above lipsum text against the pattern [^sjdhflo ]+ (note that it includes the space) we are matching against one or more characters in a row that are not s, j, d, h, f, l, o or space.
If we actually match the above pattern we see many results (239 in all), such as rem, ip, um and so on.
^ (Caret) Part 2
The ^ can also be used outside of character classes, but only at the very beginning of a RegEx pattern. When present, it anchors the pattern to the start of a line of text. This is either the start of a string or after each hard line break. So, in the above lipsum text, the pattern ^lorem would only return one match, even though the word appears in the text twice. (Note that because the above does not have any hard line breaks - only soft line breaks caused by the formatting of the webpage - the ^ will only match at the beginning of the entire block of text.)
$
The $ is used only at the end of a RegEx pattern, in a use that is opposite that of the ^ shown above. The $ indicates that pattern is anchored to the end of a line of text. See the following example:
string s = "Sally sells seashells by the sea";
Regex regex = new Regex("sea$");
MatchCollection mc = regex.Matches(s);
In the above code, mc will only contain one match. This will be the one at the end, since it has the $. If we remove the $, it will instead have two matches. Like so:
string s = "Sally sells seashells by the sea";
Regex regex = new Regex("sea");
MatchCollection mc = regex.Matches(s);
Note that if we change the original string to have proper punctuation at the end (the period) the original sea$ pattern will not match at all.
string s = "Sally sells seashells by the sea.";
Regex regex = new Regex("sea$");
MatchCollection mc = regex.Matches(s);
//mc.Count is now 0!!!
This is because the end of pattern is looking for the word sea at the end of the block of text. Because the period is there and we're not looking for it, our pattern won't match!
There are a number of ways to fix this problem, but a general punctuation match will do the trick.
string s = "Sally sells seashells by the sea.";
Regex regex = new Regex("sea[!?.,-]$", options);
MatchCollection mc = regex.Matches(s);
Note the placement of the hyphen (-) at the end of the character class so that it isn't accidentally misinterpreted as a range of characters.
Note also the caveat that although the period outside of the character class WOULD match, it wouldn't do exactly what you think. Can you guess what the following example will print?
string s = "Sally sells seashells by the seaX";
Regex regex = new Regex("sea.$", options);
MatchCollection mc = regex.Matches(s);
if (mc.Count == 0)
Response.Write("0 matches!");
else if (mc.Count == 1)
Response.Write("1 match!");
If you guessed "1 match!", you're correct - but do you know why? The answer lies in our next modifier.
. (Period)
The . is used to represent ANY character, but at least one character must be in the matching position. It's really just a placeholder to say that "something" has to be in the position indicated.
In the example given above, the X fills the position that the . is in, so we get our "1 match!".
Escaped Characters and Shortcuts
The \ backslash is an extremely powerful character in RegEx pattern matching with a plethora of uses. First, it is used to escape special characters to be taken as literals within Regular Expression patterns. For example, if we were to search for text within parenthesis, we would need to escape the parenthesis, since they are used within RegEx patterns to form groups, as I'll explain below.
To search for the word Trunks in parenthesis the pattern would be:
\(Trunks\)
Please note in the following example which shows a common problem for beginners using RegEx in .NET:
string s = "This sentence is about my cat (Trunks).";
Regex regex = new Regex("\\(Trunks\\)");
MatchCollection mc = regex.Matches(s);
Note that in the above code I used two backslashes before the parenthesis. Can you figure out why? Here's an alternative way of writing it which might give you a hint:
Regex regex = new Regex(@"\(Trunks\)");
In .NET, as well as many other languages, characters within strings can be escaped by the backslash character. \n is the newline, \t is the tab, \r is the carriage return and so on. The .NET string that you're using for the RegEx pattern is first interpreted by the .NET parser, so any escaped characters will already have been escaped by the time the RegEx parser takes control and the pattern will not match correctly.
In order to get around that problem, we first escape the backslash itself, by using a \\ construct. As mentioned above (and below) the parenthesis is used as a special character in RegEx and needs to be escaped if you intend to use it as a literal.
In the pattern \\(Trunks\\), the backslash is first escaped as a .NET string so that the RegEx parser actually sees \(Trunks\). This escapes the parenthesis in the RegEx parsing and correctly finds the parenthesis. As an additional note, if you're unclear as to why the @"\(Trunks\)" works, it is a C# verbatim string. In a verbatim string, all characters are automatically escaped (except for the double-quote character, which you represent by doubling up "") and you can include formatting like tabs and newlines.
Please remember, this is for our .NET example and won't apply to all languages. If you're not using .NET, check with your language reference to see if you need to escape the backslashes.
RegEx Escaped Characters
RegEx has its own escape strings that are mostly used as shortcuts for a large ranges of characters. Like many other languages, you escape RegEx characters by using a backslash.
Note that in .NET you need to take the above caveat into considering when using these, so, for example, to search for a literal backslash, you would need to escape it in both .NET and RegEx: \\\\. This is because first .NET will escape it into \\, then RegEx will escape it into \ and search for the literal value.
Here are a few examples:
\d The same thing as [0-9].
\w The same thing as [a-zA-Z_0-9].
\s This will match against any whitespace.
The capital versions offer negations...
\D The same thing as [^0-9].
\W The same thing as [^a-zA-Z_0-9].
\S This will match against any character EXCEPT whitespace.
There are many other, less used, escaped characters. Be sure to check with your RegEx implementation's documentation (click here for the .NET resource.)
Note again that the backslash should be double-escaped when necessary.
Grouping
In the teaching of Regular Expressions, Grouping is usually considered an advanced topic and not taught in a first lesson. Personally, I don't see the point of learning how to use RegEx unless you can use it!
A group is automatically formed everytime matching pairs of parenthesis are used in a pattern. Each match has an automatic group of the entire match and then each subsequent parenthesis pair, as shown in the following example:
string s = "Mississippi";
Regex regex = new Regex("([aeiou][s]+)");
MatchCollection mc = regex.Matches(s);
The MatchCollection.Count property would be 2, indicating that the pattern matched twice (on iss both times, since we're matching any vowel followed by the letter s one or more times). If we examine the MatchCollection, we see it is made of Match objects, with a Groups property, another collection.
In all matches, the first Group (mc[0].Groups[0]) always contains the entire match, in this case iss. Actually, in this case, the second Group (mc[0].Groups[1]) also contains iss. This is because we're grouping the entire match and the results of our match was iss. You would see the exact same results in the second Match object since iss appears twice in the word Mississippi. Hence both mc[1].Groups[0] and mc[1].Groups[1] would contain the string iss.
If we change the parenthesis slightly and only group the vowel:
string s = "Mississippi";
Regex regex = new Regex("([aeiou])[s]+");
MatchCollection mc = regex.Matches(s);
The MatchCollection.Count property would still be 2 with the same exact matches. Also, the Groups[0] property would still contain iss, since that's our entire match. However, the Groups[1].Value would now contain only the i, since that's the entire match inside our parenthesis.
We'll change it slightly one more time:
string s = "Mississippi";
Regex regex = new Regex("([aeiou])(s)s");
MatchCollection mc = regex.Matches(s);
Our pattern is trying to match a vowel followed by two letter s characters. We are grouping the vowel by itself and additionally grouping the first of two s characters.
When we run the above code we get two matches, like we expect to. Each one matches against iss. If we examine the Groups collection, we see that Groups[1] contains the i and Groups[2] contains the first s.
The Groups property is also helpful if you want to modify text in the original string that you've located using a Regular Expression.
In the below example, I'm going to very simple replace every instance of the pattern r[aeiou]+m with the word BLAH. The pattern will match one or more vowels between the letters r and m. (If you're unclear as to why, re-read the sections above!)
string l = lipsum;
Regex regex = new Regex("r[aeiou]+m");
Match m = regex.Match(lipsum);
while (m.Success)
{
l = l.Substring(0, m.Index) + "BLAH" + l.Substring(m.Index + m.Value.Length);
m = regex.Match(l);
}
We keep looping as long as a match is found and use the Index property of the Match object to determine where in the original string our match was found. We remove the match and replace it with BLAH.
The resulting string is:
LoBLAH ipsum dolor sit amet, consectetuer adipiscing elit. Duis non nulla id sapien molestie pulvinar. 'Nulla vitae risus vel quam imperdiet egestas!' Vestibulum fringilla consequat pede. Quisque tortor lectus, rhoncus ut, posuere vel, rhoncus in, tellus. Fusce mi. Curabitur eget augue sit amet loBLAH iaculis sagittis. Nam et massa. Nunc sagittis, libero et eleifend aliquet, mi sem varius orci, sit amet sagittis turpis est nec dolor? Nulla facilisi. Proin volutpat erat a sem. Maecenas nibh libero, euismod at, consequat quis, rutBLAH in, turpis. Aenean erat enim, fermentum a, luctus non, bibendum in, tellus. In ac libero. Suspendisse potenti. Pellentesque tincidunt dignissim mi.
Which you can compare with the original, here:
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Duis non nulla id sapien molestie pulvinar. 'Nulla vitae risus vel quam imperdiet egestas!' Vestibulum fringilla consequat pede. Quisque tortor lectus, rhoncus ut, posuere vel, rhoncus in, tellus. Fusce mi. Curabitur eget augue sit amet lorem iaculis sagittis. Nam et massa. Nunc sagittis, libero et eleifend aliquet, mi sem varius orci, sit amet sagittis turpis est nec dolor? Nulla facilisi. Proin volutpat erat a sem. Maecenas nibh libero, euismod at, consequat quis, rutrum in, turpis. Aenean erat enim, fermentum a, luctus non, bibendum in, tellus. In ac libero. Suspendisse potenti. Pellentesque tincidunt dignissim mi.
That's quite a lot to take in for one entry on RegEx and should definitely get any novices a giant step closer to Regular Expression mastery! Look for an advanced discussion (including some .NET specific RegEx constructs) in the future.
Simple Speed Testing in .NET - System.Diagnostics.Stopwatch
August 18, 2007 • 6:48AM • permalink
One of the biggest problems with the .NET Framework is that so many new features are released with each major version that minor additions and features usually slip through the cracks. In my experience, very few classes have been overlooked during the transition from 1.1 to 2.0 as much as the Stopwatch class (System.Diagnostics.Stopwatch).
This is a replacement for what most people probably do to test two different blocks of code that perform the same action and compare the speed between them. The process usually goes something like: log the current time, run the test, take the difference between the current time and your logged time, rinse, repeat.
As an alternative, the Stopwatch class was built using low-level API calls, with less overhead than other .NET methods. If the hardware and Windows version of the computer support a high-resolution performance counter, it will use this counter instead of the standard PC clock.
Here is a simple example:
using System.Diagnostics;
Stopwatch sw = new Stopwatch();
string s1 = "";
string s2 = "";
string letters = "abcdefghijklmnopqrstuvwxyz";
int iterations = 50000;
sw.Start();
for (int i = 0; i < iterations; ++i)
s1 += letters[i % 26];
sw.Stop();
Response.Write("Test 1: " + sw.ElapsedMilliseconds + "ms");
sw.Reset();
sw.Start();
for (int i = 0; i < iterations; ++i)
string.Concat(s2, letters[i % 26]);
sw.Stop();
Response.Write(" | Test 2: " + sw.ElapsedMilliseconds + "ms");
As it mentions above, don't forget to Reset your Stopwatch between tests or your subsequent tests will have their times inflated (since otherwise the Stopwatch will continue incrementing from the point that it was stopped).
The Stopwatch has a few other useful methods and properties, such as IsRunning and IsHighResolution. You can view the MSDN description of the Stopwatch class here.
In case you're curious, the string.Concat function outperformed the += operator by over 550 times in repeated tests on my machine. This is due to the fact that the += creates multiple (immutable) string objects, while the string.Concat uses a buffer technique to avoid that. More to come on this topic in a future post.
This is a replacement for what most people probably do to test two different blocks of code that perform the same action and compare the speed between them. The process usually goes something like: log the current time, run the test, take the difference between the current time and your logged time, rinse, repeat.
As an alternative, the Stopwatch class was built using low-level API calls, with less overhead than other .NET methods. If the hardware and Windows version of the computer support a high-resolution performance counter, it will use this counter instead of the standard PC clock.
Here is a simple example:
using System.Diagnostics;
//Note: make sure you have this at the top of the class
Stopwatch sw = new Stopwatch();
//do any preliminary processing here, to not inflate your test results
string s1 = "";
string s2 = "";
string letters = "abcdefghijklmnopqrstuvwxyz";
int iterations = 50000;
//TEST 1
sw.Start();
for (int i = 0; i < iterations; ++i)
s1 += letters[i % 26];
sw.Stop();
//END - TEST 1
Response.Write("Test 1: " + sw.ElapsedMilliseconds + "ms");
//note that you can also use sw.ElapsedTicks
sw.Reset();
//Don't forget to reset the Stopwatch before your second test!!!
//TEST 2
sw.Start();
for (int i = 0; i < iterations; ++i)
string.Concat(s2, letters[i % 26]);
sw.Stop();
//END - TEST 2
Response.Write(" | Test 2: " + sw.ElapsedMilliseconds + "ms");
As it mentions above, don't forget to Reset your Stopwatch between tests or your subsequent tests will have their times inflated (since otherwise the Stopwatch will continue incrementing from the point that it was stopped).
The Stopwatch has a few other useful methods and properties, such as IsRunning and IsHighResolution. You can view the MSDN description of the Stopwatch class here.
In case you're curious, the string.Concat function outperformed the += operator by over 550 times in repeated tests on my machine. This is due to the fact that the += creates multiple (immutable) string objects, while the string.Concat uses a buffer technique to avoid that. More to come on this topic in a future post.
.NET Interview Questions - Part 2
August 26, 2007 • 9:41PM • permalink
Part 1 can be found here.
Continuing where we left off...
3) What does the term "immutable string" mean?
Strings in C# are immutable, meaning that the string object cannot be modified once it has been instantiated. Take a look at the following:
string s1 = "Hello, my name is ";
s1 += "adam";
While it appears that the string 'adam' is simply being appended to the original string, a lot more work is taking place on the back end. Since the string s1 is immutable, it can't be changed. Implicitly, c# is creating a new buffer in memory that can hold both the "My name is " and "adam". Once this new buffer is filled (and a new string object is created) s1 is assigned the new reference and the old references eventually become garbage collected.
If you read my blog entry on the System.Diagnostics.Stopwatch class, you saw that this can have a huge effect on the performance of an application.
In the Stopwatch blog entry, I used the string.Concat method in the optimized test. While that method outperforms the standard += operator (and can be further optimized by using the overloaded version that allows passing in four string parameters), there is an even better way, for doing many append operations.
using System.Text;
StringBuilder sb = new StringBuilder();
string letters = "abcdefghijklmnopqrstuvwxyz";
int iterations = 100000;
for (int i = 0; i < iterations; ++i)
sb.Append(letters[i % 26]);
In my tests, the StringBuilder class' Append method was twice as fast as using string.Concat. This is because the StringBuilder class has preallocated a buffer of memory upon initialization, which the string.Concat object is still creating extra immutable string objects that aren't needed.
4) Describe the concept of lazy-initialization in OOP?
Lazy-initialization is best shown with a working example...
Let us take for instance that we have a class that represents a Building, perhaps for a game. In this Building class, we have various properties to represent the different rooms that you might find in the building.
For sake of this argument, let us also suppose that the general architecture of the game dictates that the Building class reference is going to be persisted in a database and not in static memory (maybe it's web-based). When we create the instance of the Building object, it would be a huge mistake to instantiate all the various rooms of the Building in the constructor.
There could be a Kitchen, a JanitorCloset, a SupplyRoom, a Gym, a Hallway or maybe even Bathroom[]. If only one of the above is used in the current execution, then we would waste a lot of processing on both the web server and possibly on either a cache server or a database server (or both), retrieving data for the rooms we don't need. This is a very common mistake that is made by developers that are new to OOP.
Lazy-initialization can help us improve our performance by only creating the objects as they are needed. Like in the following example:
public class Building
{
private Room kitchen;
private Room supply_room;
...
public Room Kitchen
{
get {
if (kitchen == null)
kitchen = new Room(building_id, 'kitchen');
return kitchen;
}
}
public Room SupplyRoom
{
get {
if (supply_room == null)
supply_room= new Room(building_id, 'supply_room');
return supply_room;
}
}
}
In the code above, the first time the Kitchen or SupplyRoom properties are used, their associated private fields will be null and will be instantiated through their respective Room constructors. If they aren't used in a given request, they won't be instantiated at all!
5) Write a Generic Method that takes one parameter of any type and returns the same type that is passed in.
This question really isn't fair, but we like to throw it out there anyways. We don't base any candidate on their ability to answer it (because although 1 person actually got close, nobody has ever answered such a simple question). This is actually one of the first questions we ask, mostly to set the bar. Most of the time we see a person writing a function that attempts to use Generics, which is better than some other things I've seen... (Note that this is a .NET 2.0 specific feature)
The solution is actually very simple:
using System.Collections.Generic;
public T SomeFunc<T>(T obj)
{
T new_object_copy = obj;
return new_object_copy;
}
That's it! I only added the line T new_object_copy = obj to do something more interesting than just return the object.
We can now call this with virtually any type we want!
string s = SomeFunc<string>("hi");
int i = SomeFunc<int>(5);
my_class m = SomeFunc<my_class>(new my_class());
As a bonus, you can constrain the types of objects that you want to allow the function to operate on by adding a where clause:
public T SomeFunc<T>(T obj) where T : IList, IEnumerable
{
T new_object_copy = obj;
return new_object_copy;
}
List l = new List<string>();
List l2 = SomeFunc<List<string>>(l);
string s = SomeFunc<string>("hi");
Simple!
Maybe so, but we've yet to see somebody answer it correctly (and I've done literally fifty interviews in the last six months). We even joke around the office that if somebody can answer it correctly, they are an "instant hire". (Offer now null and void, since I've given the solution away.)

.NET Quickies
* What is the C# coalesce operator and how is it used?
The coalesce operator in C# is the ?? operator. It is used to do a conditional assignment to a variable, evaluating from left to right and stopping on the first non-null result. For example:
string a = null;
string b = "Yay!";
string c = a ?? b;
* What is the size (in bytes) of the following data types on a 32-bit machine: byte, short, int, float and double?
byte = 1 byte
short = 2 bytes
int = 4 bytes
float = 4 bytes
double = 8 bytes
* What is the result of bit-shifting an integer (either to the right or the left)?
Bit-shifting an integer to the left will multiply the number by two for each bit shifted.
Bit-shifting an integer to the right will divide the number by two for each bit shifted.
I could talk about simple interview questions forever! I'm sure that I will have much more to write about on this topic in the future!
Continuing where we left off...
3) What does the term "immutable string" mean?
Strings in C# are immutable, meaning that the string object cannot be modified once it has been instantiated. Take a look at the following:
string s1 = "Hello, my name is ";
s1 += "adam";
While it appears that the string 'adam' is simply being appended to the original string, a lot more work is taking place on the back end. Since the string s1 is immutable, it can't be changed. Implicitly, c# is creating a new buffer in memory that can hold both the "My name is " and "adam". Once this new buffer is filled (and a new string object is created) s1 is assigned the new reference and the old references eventually become garbage collected.
If you read my blog entry on the System.Diagnostics.Stopwatch class, you saw that this can have a huge effect on the performance of an application.
In the Stopwatch blog entry, I used the string.Concat method in the optimized test. While that method outperforms the standard += operator (and can be further optimized by using the overloaded version that allows passing in four string parameters), there is an even better way, for doing many append operations.
using System.Text;
//Place this at the top...
StringBuilder sb = new StringBuilder();
string letters = "abcdefghijklmnopqrstuvwxyz";
int iterations = 100000;
for (int i = 0; i < iterations; ++i)
sb.Append(letters[i % 26]);
In my tests, the StringBuilder class' Append method was twice as fast as using string.Concat. This is because the StringBuilder class has preallocated a buffer of memory upon initialization, which the string.Concat object is still creating extra immutable string objects that aren't needed.
4) Describe the concept of lazy-initialization in OOP?
Lazy-initialization is best shown with a working example...
Let us take for instance that we have a class that represents a Building, perhaps for a game. In this Building class, we have various properties to represent the different rooms that you might find in the building.
For sake of this argument, let us also suppose that the general architecture of the game dictates that the Building class reference is going to be persisted in a database and not in static memory (maybe it's web-based). When we create the instance of the Building object, it would be a huge mistake to instantiate all the various rooms of the Building in the constructor.
There could be a Kitchen, a JanitorCloset, a SupplyRoom, a Gym, a Hallway or maybe even Bathroom[]. If only one of the above is used in the current execution, then we would waste a lot of processing on both the web server and possibly on either a cache server or a database server (or both), retrieving data for the rooms we don't need. This is a very common mistake that is made by developers that are new to OOP.
Lazy-initialization can help us improve our performance by only creating the objects as they are needed. Like in the following example:
public class Building
{
private Room kitchen;
private Room supply_room;
...
public Room Kitchen
{
get {
if (kitchen == null)
kitchen = new Room(building_id, 'kitchen');
return kitchen;
}
}
public Room SupplyRoom
{
get {
if (supply_room == null)
supply_room= new Room(building_id, 'supply_room');
return supply_room;
}
}
}
In the code above, the first time the Kitchen or SupplyRoom properties are used, their associated private fields will be null and will be instantiated through their respective Room constructors. If they aren't used in a given request, they won't be instantiated at all!
5) Write a Generic Method that takes one parameter of any type and returns the same type that is passed in.
This question really isn't fair, but we like to throw it out there anyways. We don't base any candidate on their ability to answer it (because although 1 person actually got close, nobody has ever answered such a simple question). This is actually one of the first questions we ask, mostly to set the bar. Most of the time we see a person writing a function that attempts to use Generics, which is better than some other things I've seen... (Note that this is a .NET 2.0 specific feature)
The solution is actually very simple:
using System.Collections.Generic;
//Place this at the top...
public T SomeFunc<T>(T obj)
{
T new_object_copy = obj;
return new_object_copy;
}
That's it! I only added the line T new_object_copy = obj to do something more interesting than just return the object.
We can now call this with virtually any type we want!
string s = SomeFunc<string>("hi");
int i = SomeFunc<int>(5);
my_class m = SomeFunc<my_class>(new my_class());
As a bonus, you can constrain the types of objects that you want to allow the function to operate on by adding a where clause:
public T SomeFunc<T>(T obj) where T : IList, IEnumerable
{
T new_object_copy = obj;
return new_object_copy;
}
List
List
//This will work
string s = SomeFunc<string>("hi");
//This will no longer compile
Simple!
Maybe so, but we've yet to see somebody answer it correctly (and I've done literally fifty interviews in the last six months). We even joke around the office that if somebody can answer it correctly, they are an "instant hire". (Offer now null and void, since I've given the solution away.)
.NET Quickies
* What is the C# coalesce operator and how is it used?
The coalesce operator in C# is the ?? operator. It is used to do a conditional assignment to a variable, evaluating from left to right and stopping on the first non-null result. For example:
string a = null;
string b = "Yay!";
string c = a ?? b;
//The value in b will be assigned to c
* What is the size (in bytes) of the following data types on a 32-bit machine: byte, short, int, float and double?
byte = 1 byte
short = 2 bytes
int = 4 bytes
float = 4 bytes
double = 8 bytes
* What is the result of bit-shifting an integer (either to the right or the left)?
Bit-shifting an integer to the left will multiply the number by two for each bit shifted.
Bit-shifting an integer to the right will divide the number by two for each bit shifted.
I could talk about simple interview questions forever! I'm sure that I will have much more to write about on this topic in the future!
|
page 1 of 1
|