I'm Eddy G. Apple enthusiast. perl aficionado. Newbie iOS developer. Photographer. Single-malt scotch drinker. #Trance music fan.
10 stories

The United States Has Forgotten how to make bridge quality steel

The Verrazano-Narrows Bridge was a feat of American engineering when it was built across New York's harbor in the 1960s. Now, it's being repaired with steel made in China.

Chinese bridge steel was cheaper.
US steel contractors either went out of business or get very few projects and thus have little active experience.
Chinese companies have become specialists in making parts for bridges across the U.S.

Last year, New York's Metropolitan Transportation Authority awarded a $235.7 million contract to a California contractor to repair the Verrazano-Narrows, a towering suspension bridge that is still the longest in the U.S.

The contractor, Tutor Perini, subcontracted the fabrication of steel decks for the bridge to China Railway Shanhaiguan Bridge Group, which the MTA says is using 15,000 tons of steel plate made by China's Anshan Iron and Steel Group.

Read more »
Read the whole story
3335 days ago
Share this story

Eight Of The World’s Most Mind-Blowing Natural Phenomena

natural phenomena catatumbo 1 Eight Of The Worlds Most Mind Blowing Natural Phenomena

Source: Blogspot

Catalumbo Lightning

natural phenomena catatumbo 2 Eight Of The Worlds Most Mind Blowing Natural Phenomena

Source: Sodahead

Confined to the skies above Venezuela’s Lake Maracaibo, the ceaseless streaks of Catalumbo lightning have captivated the interests of scientists, explorers and artists for centuries. For nearly half the year and up to ten hours a day, the natural methane and oil deposit-caused phenomenon can be observed in the bucolic Venezuelan horizon up to 280 times an hour. And if you happen to visit Venezuela when the lightning isn’t able to be observed, fret not; while these flashes of light are technically momentary, Catalumbo lightning has manifested itself into the melody of the state’s anthem.

natural phenomena catatumbo 4 Eight Of The Worlds Most Mind Blowing Natural Phenomena

Source: Mango World

Nacreous Clouds

bizarre natural phenomena nacreous clouds 1 Eight Of The Worlds Most Mind Blowing Natural Phenomena

Source: Wiki Commons

While the pastel-tinted pictures of nacreous clouds might seem more akin to an abstract artist’s thoughts on Spring than natural science, the clouds owe their pristine coloring to the stratosphere in which they reside. Alternately dubbed the “mother of pearl” cloud given its iridescent coloring, the nacreous cloud may only be found in the early evening or dawn and in particularly frigid regions at distances of 9 to 16 miles above ground. So for all of you Loch Ness Monster hunters scaling the depths of Scotland’s Inverness, if you don’t end up discovering Nessie, just look up; you might bear witness to something just as mystical. Given the cloud’s peculiar shape, coloring and moments of visibility, many individuals who aren’t familiar with the cloud often mistake it for a UFO.

bizarre natural phenomena nacreous clouds 4 Eight Of The Worlds Most Mind Blowing Natural Phenomena

Source: Wikipedia

bizarre natural phenomena nacreous clouds 3 Eight Of The Worlds Most Mind Blowing Natural Phenomena

Source: Leif Haugen

bizarre natural phenomena nacreous clouds 2 Eight Of The Worlds Most Mind Blowing Natural Phenomena

Source: Tumblr

bizarre natural phenomena nacreous cloud 2 Eight Of The Worlds Most Mind Blowing Natural Phenomena

Source: Tumblr

Fire Rainbows

bizarre natural phenomena fire rainbow 1 Eight Of The Worlds Most Mind Blowing Natural Phenomena

Source: Fan Pop

Unfortunately, those within the weather world would be quick to tell you that what you might call a fire rainbow is actually a circumhorizontal arc. All extremely long adjectives aside, the smoking-hot rainbow you might be fortunate enough to witness swaying among the clouds is actually cold as ice and not related to rainbows at all. Known as the rarest of all naturally-occurring phenomena, for the fire rainbow to be seen very specific elements must be at play: first of all, the clouds through which the light refracts must be at least 20,000 feet in the air and must also be of the cirrus variety. Further, the sun has to be elevated at an angle of precisely 58 degrees. What this often translates to is that those picnicking in the park in the United States are more apt to be warmed by the icy light’s technicolor rays than those in Northern Europe given the region’s extreme fluctuations in sunlight. Sorry, Denmark.

bizarre natural phenomena fire rainbow 2 Eight Of The Worlds Most Mind Blowing Natural Phenomena

Source: Orneveien

Sun Dogs

bizarre natural phenomena sun dogs 1 Eight Of The Worlds Most Mind Blowing Natural Phenomena

Source: NIWA

The natural phenomena commonly known as sun dogs has beguiled philosophical greats from Aristotle all the way to Descartes. It was the sun dog sighting, after all, that caused Descartes to take a break from his metaphysical studies and write his book on natural philosophy aptly called “The World”.

Like the fire rainbow, the sun dog consists of vertically-aligned ice crystals which, when the angle is right, create a horizontal refraction and halo-shaped figure around the sun.

bizarre natural phenomena sun dogs 2 Eight Of The Worlds Most Mind Blowing Natural Phenomena

Source: Blogspot

Striped Icebergs

Never has the sight of dead krill and trapped sediment on ice been more beautiful. The candy-striped icebergs seen floating around–most commonly around 1,700 miles south of Cape Town–are the result of ice crystals forming beneath an iceberg and rising up to the berg’s bottom, inevitably trapping dark-colored sediment and krill within it.

bizarre natural phenomena striped iceberg 2 Eight Of The Worlds Most Mind Blowing Natural Phenomena

Source: Tumblr

Fire Whirls

bizarre natural phenomena fire whirl 1 Eight Of The Worlds Most Mind Blowing Natural Phenomena

Source: The Magazine

While the infernal twister is typically seen for only minutes at a time, its damning effects can certainly seem eternal. 10 to 50 meters tall, fire whirls are formed by unique air temperatures and currents and have enough force to uproot a tree up to 50 feet tall.

bizarre natural phenomena fire whirl 4 Eight Of The Worlds Most Mind Blowing Natural Phenomena

Source: Kuriositas

bizarre natural phenomena fire whirl 5 Eight Of The Worlds Most Mind Blowing Natural Phenomena

Source: Flickr

Monarch Butterfly Migrations

bizarre natural phenomena monarch migration 1 Eight Of The Worlds Most Mind Blowing Natural Phenomena

Source: Visual Logs

Few migrations are surrounded with as much mystery as that of the monarch butterfly. The only butterfly known to make north-south migrations like birds and capable of making transatlantic crossings, scientists are still baffled by the monarch’s ability to return to the same spots year after year–especially as no single butterfly can make it the entire way. Seeing as monarch butterflies only have a lifespan of approximately two months, females–largely immune to predators given that they are poisonous to birds–lay eggs along the way, thus making it possible for the several-month cycle to continue indefinitely in spite of their mortality.

bizarre natural phenomena monarch migration 2 Eight Of The Worlds Most Mind Blowing Natural Phenomena

Source: RH Fleet

bizarre natural phenomena monarch migration 3 Eight Of The Worlds Most Mind Blowing Natural Phenomena

Source: Going Wild

The post Eight Of The World’s Most Mind-Blowing Natural Phenomena appeared first on All That Is Interesting.

Read the whole story
3420 days ago
Share this story

Slow Cooker Baked Potatoes - An easy way to "bake" regular and sweet potatoes without oven heat

9 Comments and 12 Shares

Cooking just doesn't get much easier than this. You may be thinking, "It's not difficult to bake potatoes in the oven." I agree. But, here's what I love about this slow cooker method for "baking" potatoes:

  • It doesn't require any oven heat; that's especially a plus during the hot summer months when you don't want to heat up the kitchen by having the oven on for an hour.
  • The potatoes can be started in the slow cooker in the morning and forgotten until it's time to eat them with dinner--no fuss, no stress!
  • They can be prepped the night before, so all you have to do on a busy morning is plug in the slow cooker.

This is so easy, it's kind of hard to call it a recipe. It's really more of technique. I first learned about this from a Martha Stewart recipe.  Here's how to bake spuds the easy way:

Step-by-step photos for making
Slow Cooker Baked Potatoes

Step 1. Assemble the ingredients:

  • potatoes: Idaho, Yukon gold, and sweet potatoes all work well. I prefer organic potatoes, especially since I like to eat the nutrient-rich skin. You can cook whatever quantity suits you, as long as you don't fill your slow cooker more than 3/4 full. I can fit approx. 4 lbs of spuds in my 6.5 qt. slow cooker.
  • olive oil -- this is optional, but it adds flavor while the spuds cook
  • salt & pepper -- also optional, but adds flavor; I prefer freshly ground sea salt and black pepper
  • herbs, garlic powder, other seasonings -- all optional. I normally just stick with salt and pepper; but you can add more flavors, if you like.

view on Amazon:
My 6.5-qt slow cooker (rated #1 by Cooks Illustrated)
Oxo salt & pepper grinders (My son recommended these and they are awesome--there's a dial to easily adjust from fine to coarse grind. They work better than any grinders I've used before)


Step 2. Wash and scrub the potatoes, removing any rotten or bad spots. I like to use a 3-M scrub pad for scrubbing. Although they're made for scrubbing dishes & pots, they work great for scrubbing potatoes, because they're flexible and can get into every nook and cranny. These scrubbers are widely available at grocery and discount stores (Target,WalMart) and are also available on Amazon.

     view on Amazon: 3M Scotch Brite scrub pads

Let the potato skins dry on the outside before proceeding. It only takes a few minutes.

Step 3. Prick each potato with a fork--approx. 6-8 times per potato. This keeps them from exploding when steam builds up inside as they cook.


Step 4. Place potatoes on a large baking sheet or dish (to contain the mess), drizzle a little olive oil on each one, and rub oil all over each potato with your hands.

Step 5. Sprinkle on salt and pepper, or whatever other herbs or seasonings your little heart desires.


Step 6. Wrap each potato individually in aluminum foil, sealing them up tightly. I use pre-cut aluminum foil sheets that I bought at Costco. They are so convenient and also available on Amazon.

    view on Amazon: economy priced aluminum foil sheets


Step 7. Place the wrapped potatoes in the slow cooker, foil seam side up. No need to grease the inside of the crock pot or add any liquid. Cook them dry--there's no mess or clean up when you're through. Cook them on low for 8 hours (slow cooker times may vary), until tender when pressed with fingers.

  • Note: It's important not to over-fill your slow cooker. If you do, the potatoes on top won't get done before the ones on the bottom are overcooked. I fill mine approx. 3/4 full.


Done! Remove the foil, cut the potatoes across the top, press in the two ends toward the center, and the potato should open up.

The results:

  • The inside is moist and delicious. The slow cooking time enhances the natural sugars in the potato, so they taste a bit sweeter than when they're baked in the oven. 
  • They don't have the white, dry, fluffy texture of potatoes baked in the oven. Slow cooking makes them more moist and sweet. Both taste good; they're just different.
  • You may notice that the flesh next to the skin has darkened. That's because the pigment and flavors from the skin and olive oil have been absorbed into the potato. Taste's great, so you don't need to avoid eating those darkened areas (as long as you were careful to remove any rotten spots before cooking them).


Regular white Russet potatoes, Yukon Golds, and sweet potatoes all turn out great in the slow cooker. The flavor of the sweet potatoes is A-MAZING! My favorite--and they are crazy nutritious.


OOPS ALERT! Here's how a Russet potato looked that was overcooked in the slow cooker. I had left a couple of cooked potatoes in my slow cooker and thought I'd turned it off but actually turned it on high. 2 hours later, I discovered darkened, overcooked potatoes. Yuck. Lesson learned. 



Many ways to enjoy baked potatoes...

Serve them as a side with a meal.

  • They're traditionally topped with sour cream & chives. For a healthier alternative, try Greek yogurt or my healthy sour cream substitute that's made with cottage cheese.
    view Healthy Sour Cream Substitute recipe


Serve them as a main course.

  • Have a Baked Potato Bar with topping options for loading them up. This makes a fun family meal or party buffet. You can prep all the toppings ahead of time and slow cook the potatoes during the day, so everything is ready at dinner time. Easy!
    view my Baked Potato Bar topping tips and recipe



Make soup with the leftovers.


There are so many ways to enjoy baked potatoes, and using a crock pot makes them easier than ever to prepare.

Make it a Yummy day!

Link directly to this recipe Print this recipe
Baked Potatoes in a Slow Cooker
By Monica
  • Russet, Yukon Gold, or sweet potatoes -- enough to fill your slow cooker no more than 3/4 full*
  • olive oil (1-2 tablespoons for 4 lbs. potatoes)
  • salt & pepper
  • other seasonings, optional (garlic powder, herbs, etc.)
You need: a slow cooker and aluminum foil

Directions: Wash, scrub, and cut out any bad spots from potatoes. Wipe them dry and let them sit until skin is visibly dry all over (they should dry within a few minutes). Prick each potato 6-8 times with a fork. Place potatoes in single layer on baking sheet or large dish, drizzle with olive oil, and use hands to rub a thin coat of olive oil evenly all over the potato skin. Sprinkle with salt & pepper (and other herbs or seasoning, if desired). Wrap each potato individually in a piece of aluminum foil. Add potatoes to slow cooker, foil seam side up, being careful that it's not more than 3/4 full. Cover and cook on low for 8 hours (slow cooker times may vary), until tender when pressed with fingers.

*approximately 4 lbs of potatoes can be cooked in a 6-1/2 qt. slow cooker.
Print this Recipe Share this Recipe

Read the whole story
3420 days ago
Had no idea you could cook potatoes this way!
Share this story
8 public comments
3425 days ago
My love of potatoes is never-ending.
Richmond, VA
3428 days ago
!!! mind blown. Also that baked sweet potato w/asparagus and…feta? is v. intriguing.
3428 days ago
uh, Darius, we're doing this FYI.
Portland, OR
3428 days ago
ohhh my god
3424 days ago
UPDATE: Darius made these today. Will report tomorrow live on twitter how they turned out (he did the cinnamon + sweet potato variant).
3422 days ago
I made good old russets on Saturday and even though they cooked a little too long (I was at rehearsal), they were AMAZEBALLS.
3422 days ago
3429 days ago
Now I just need to get a new slow cooker...
Austin, TX
3429 days ago
I can't believe I didn't know you could bake potatoes in a slow cooker! Best news ever!
San Francisco
3429 days ago
I want a loaded baked potato now--slow cooker baked potatoes.
Louisville, KY

Want Real Eggs at McDonald's? Just Ask!

2 Comments and 3 Shares


[Photographs: J. Kenji Lopez-Alt]

I gotta admit it: I have a secret love for McDonald's breakfast sandwiches. On the morning after a rough night out, I wake up with a deep hole in the pit of my stomach. A McDonald's bacon, egg, and cheese biscuit-shaped hole that I only know of one way to fill. If you don't know what I'm talking about or feel like averting your eyes in disgust at the image of that neatly-wrapped bundle of salt and fat above, then you may as well hit the close button on your browser right now. This is not the post for you.

For the rest of you, come with me. I've got a little secret to share.

Are we all in agreement that the biscuit option at McDonald's is the best of the sandwich-holders, handily defeating the lame English muffins, trouncing those squishy round things they like to call bagels, and narrowly edging out the salty-sweet pleasure of a McGriddle?

And are we also in agreement that the worst part of their biscuit sandwiches is that strangely folded egg patty? It's pre-cooked, reheated, rubbery, oddly flavored, not completely unpleasant, but definitely not egg-like.

Well here's the deal: you can get your McDonald's biscuit sandwiches (or any breakfast sandwich, for that matter) made with a 100% real egg, cracked and cooked fresh on-premises. All you've got to do is tell the cashier that you'd like your sandwich made with a "round egg" and they'll replace your folded egg patty with a real egg, free of charge.


It'll even appear on your receipt that way. The round eggs are the same ones they use on the Egg McMuffin, made from a real egg cooked on the flattop in a ring-shaped mold. The difference it makes for the sandwich is huge.


An egg sandwich from McDonald's that actually tastes like egg? Who'da thunk it?


Take a look at their relative cross-sections. The round egg even has a touch of lightly-cooked, soft yolk in the center. Just like a real fried egg. Almost.

About the author: J. Kenji Lopez-Alt is the Chief Creative Officer of Serious Eats where he likes to explore the science of home cooking in his weekly column The Food Lab. You can follow him at @thefoodlab on Twitter, or at The Food Lab on Facebook.

Read the whole story
3420 days ago
This is a nice "food hack".
Share this story

Instagram vs. Twitter's Vine App


Read the whole story
3478 days ago
Hahaha... perfect!
3479 days ago
Yep, that’s it.
The Republic of California
Share this story

The true power of regular expressions

1 Share

Comments: "The true power of regular expressions"

URL: http://nikic.github.com/2012/06/15/The-true-power-of-regular-expressions.html

As someone who frequents the PHP tag on StackOverflow I pretty often see questions about how to parse some particular aspect of HTML using regular expressions. A common reply to such a question is:

You cannot parse HTML with regular expressions, because HTML isn’t regular. Use an XML parser instead.

This statement - in the context of the question - is somewhere between very misleading and outright wrong. What I’ll try to demonstrate in this article is how powerful modern regular expressions really are.

What does “regular” actually mean?

In the context of formal language theory, something is called “regular” when it has a grammar where all production rules have one of the following forms:

B -> a
B -> aC
B -> ε

You can read those -> rules as “The left hand side can be replaced with the right hand side”. So the first rule would be “B can be replaced with a”, the second one “B can be replaced with aC” and the third one “B can be replaced with the empty string” (ε is the symbol for the empty string).

So what are B, C and a? By convention, uppercase characters denote so called “non-terminals” - symbols which can be broken down further - and lowercase characters denote “terminals” - symbols which cannot be broken down any further.

All that probably sounds a bit abstract, so let’s look at an example: Defining the natural numbers as a grammar.

N -> 0
N -> 1
N -> 2
N -> 3
N -> 4
N -> 5
N -> 6
N -> 7
N -> 8
N -> 9
N -> 0N
N -> 1N
N -> 2N
N -> 3N
N -> 4N
N -> 5N
N -> 6N
N -> 7N
N -> 8N
N -> 9N

What this grammar says is:

A natural number (N) is
... one of the digits 0 to 9
... one of the digits 0 to 9 followed by another natural number (N)

In this example the digits 0 to 9 would be terminals (as they can’t be broken down any further) and N would be the only non-terminal (as it can be and is broken down further).

If you have another look at the rules and compare them to the definition of a regular grammar from above, you’ll see that they meet the criteria: The first ten rules are of the form B -> a and the second ten rules follow the form B -> aC. Thus the grammar defining the natural numbers is regular.

Another thing you might notice is that even though the above grammar defines such a simple thing, it is already quite bloated. Wouldn’t it be better if we could express the same concept in a more concise manner?

And that’s where regular expressions come in: The above grammar is equivalent to the regex [0-9]+ (which is a hell lot simpler). And this kind of transformation can be done with any regular grammar: Every regular grammar has a corresponding regular expression which defines all its valid strings.

What can regular expressions match?

Thus the question arises: Can regular expressions match only regular grammars, or can they also match more? The answer to this is both yes and no:

Regular expressions in the formal grammar sense can (pretty much by definition) only parse regular grammars and nothing more.

But when programmers talk about “regular expressions” they aren’t talking about formal grammars. They are talking about the regular expression derivative which their language implements. And those regex implementations are only very slightly related to the original notion of regularity.

Any modern regex flavor can match a lot more than just regular languages. How much exactly, that’s what the rest of the article is about.

To keep things simple, I’ll focus on the PCRE regex implementation in the following, simply because I know it best (as it’s used by PHP). Most other regex implementations are quite similar though, so most stuff should apply to them too.

The language hierarchy

In order to analyze what regular expressions can and cannot match, we first have to look at what other types of languages there are. A good starting point for this is the Chomsky hierarchy:

Chomsky hierarchy:
| |
| Recursively enumerable languages | Type 0
| |
| /-----------------------------------\ |
| | | |
| | Context-sensitive languages | | Type 1
| | | |
| | /---------------------------\ | |
| | | | | |
| | | Context-free languages | | | Type 2
| | | | | |
| | | /-------------------\ | | |
| | | | Regular languages | | | | Type 3
| | | \-------------------/ | | |
| | \---------------------------/ | |
| \-----------------------------------/ |

As you can see the Chomsky hierarchy divides formal languages into four types:

Regular languages (Type 3) are the least-powerful, followed by the context-free languages (Type 2), the context-sensitive languages (Type 1) and at last the all-mighty recursively enumerable languages (Type 0).

The Chomsky hierarchy is a containment hierarchy, so the smaller boxes in the above image are fully contained in the larger boxes. For example every regular language is also a context-free language (but not the other way around!)

So, let’s move one step up in that hierarchy: We already know that regular expressions can match any regular language. But can they also match context-free languages?

(Reminder: When I say “regular expression” here I obviously mean it in the programmer sense, not the formal language theory sense.)

Matching context-free languages

The answer to this is yes, they can!

Let’s take the classical example of a context-free language, namely {a^n b^n, n>0}, which means “A number of a characters followed by the same number of b characters”. The (PCRE) regex for this language is:


The regular expression is very simple: (?1) is a reference to the first subpattern, namely (a(?1)?b). So basically you could replace the (?1) by that subpattern, thus forming a recursive dependency:

# and so on

From the above expansions it should be clear that this expression can match any string with the same number of as and bs.

Thus regular expressions can match at least some non-regular, context-free grammars. But can they match all? To answer that, we first have to look at how context-free grammars are defined.

In a context-free grammar all production rules take the following form:

A -> β

Here A once again is a non-terminal symbol and β is an arbitrary string of terminals and non-terminals. Thus every production rule of a context-free grammar has a non-terminal on the left hand side and an arbitrary symbol string on the right hand side.

As an example, have a look at the following grammar:

function_declaration -> T_FUNCTION is_ref T_STRING '(' parameter_list ')' '{' inner_statement_list '}'
is_ref -> '&'
is_ref -> ε
parameter_list -> non_empty_parameter_list
parameter_list -> ε
non_empty_parameter_list -> parameter
non_empty_parameter_list -> non_empty_parameter_list ',' parameter
// ... ... ...

What you see there is an excerpt from the PHP grammar (just a few sample rules). The syntax is slightly different from what we used before, but should be easy to understand. One aspect worth mentioning is that the uppercase T_SOMETHING names here also are terminal symbols. These symbols which are usually called tokens encode more abstract concepts. E.g. T_FUNCTION represents the function keyword and T_STRING is a label token (like getUserById or some_other_name).

I’m using this example to show one thing: Context-free grammars are already powerful enough to encode quite complex languages. That’s why pretty much all programming languages have a context-free grammar. In particular this also includes well-formed HTML.

Now, back to the actual question: Can regular expressions match all context-free grammars? Once again, the answer is yes!

This is pretty easy to prove as regular expressions (at least PCRE and similar) provide a syntax very similar to the above for constructing grammars:

 (?<addr_spec> (?&local_part) @ (?&domain) )
 (?<local_part> (?&dot_atom) | (?&quoted_string) | (?&obs_local_part) )
 (?<domain> (?&dot_atom) | (?&domain_literal) | (?&obs_domain) )
 (?<domain_literal> (?&CFWS)? \[ (?: (?&FWS)? (?&dtext) )* (?&FWS)? \] (?&CFWS)? )
 (?<dtext> [\x21-\x5a] | [\x5e-\x7e] | (?&obs_dtext) )
 (?<quoted_pair> \\ (?: (?&VCHAR) | (?&WSP) ) | (?&obs_qp) )
 (?<dot_atom> (?&CFWS)? (?&dot_atom_text) (?&CFWS)? )
 (?<dot_atom_text> (?&atext) (?: \. (?&atext) )* )
 (?<atext> [a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+ )
 (?<atom> (?&CFWS)? (?&atext) (?&CFWS)? )
 (?<word> (?&atom) | (?&quoted_string) )
 (?<quoted_string> (?&CFWS)? " (?: (?&FWS)? (?&qcontent) )* (?&FWS)? " (?&CFWS)? )
 (?<qcontent> (?&qtext) | (?&quoted_pair) )
 (?<qtext> \x21 | [\x23-\x5b] | [\x5d-\x7e] | (?&obs_qtext) )
 # comments and whitespace
 (?<FWS> (?: (?&WSP)* \r\n )? (?&WSP)+ | (?&obs_FWS) )
 (?<CFWS> (?: (?&FWS)? (?&comment) )+ (?&FWS)? | (?&FWS) )
 (?<comment> \( (?: (?&FWS)? (?&ccontent) )* (?&FWS)? \) )
 (?<ccontent> (?&ctext) | (?&quoted_pair) | (?&comment) )
 (?<ctext> [\x21-\x27] | [\x2a-\x5b] | [\x5d-\x7e] | (?&obs_ctext) )
 # obsolete tokens
 (?<obs_domain> (?&atom) (?: \. (?&atom) )* )
 (?<obs_local_part> (?&word) (?: \. (?&word) )* )
 (?<obs_dtext> (?&obs_NO_WS_CTL) | (?&quoted_pair) )
 (?<obs_qp> \\ (?: \x00 | (?&obs_NO_WS_CTL) | \n | \r ) )
 (?<obs_FWS> (?&WSP)+ (?: \r\n (?&WSP)+ )* )
 (?<obs_ctext> (?&obs_NO_WS_CTL) )
 (?<obs_qtext> (?&obs_NO_WS_CTL) )
 (?<obs_NO_WS_CTL> [\x01-\x08] | \x0b | \x0c | [\x0e-\x1f] | \x7f )
 # character class definitions
 (?<VCHAR> [\x21-\x7E] )
 (?<WSP> [ \t] )

What you see above is a regular expression for matching email addresses as per RFC 5322. It was constructed simply by transforming the BNF rules from the RFC into a notation that PCRE understands.

The syntax is quite simple:

All rules definitions are wrapped into a DEFINE assertion, which basically means that all those rules should not be directly matched against, they should just be defined. Only the ^(?&addr_spec)$ part at the end specifies what should be matched.

The rule definitions are actually not really “rules” but rather named subpatterns. In the previous (a(?1)?b) example the 1 referenced the first subpattern. With many subpatterns this obviously is impractical, thus they can be named. So (?<xyz> ...) defines a pattern with name xyz. (?&xyz) then references it.

Also, pay attention to another fact: The regular expression above uses the x modifier. This instructs the engine to ignore whitespace and to allow #-style comments. This way you can nicely format the regex, so that other people can actually understand it. (Much unlike this RFC 822 email address regex…)

The above syntax thus allows simple mappings from grammars to regular expressions:

A -> B C
A -> C D
// becomes
(?<A> (?&B) (?&C)
 | (?&C) (?&D)

The only catch is: Regular expressions don’t support left recursion. E.g. taking the above definition of a parameter list:

non_empty_parameter_list -> parameter
non_empty_parameter_list -> non_empty_parameter_list ',' parameter

You can’t directly convert it into a grammar based regex. The following will not work:

 | (?&non_empty_parameter_list) , (?&parameter)

The reason is that here non_empty_parameter_list appears as the leftmost part of it’s own rule definition. This is called left-recursion and is very common in grammar definitions. The reason is that the LALR(1) parsers which are usually used to parse them handle left-recursion much better than right-recursion.

But, no fear, this does not affect the power of regular expressions at all. Every left-recursive grammar can be transformed to a right-recursive one. In the above example it’s as simple as swapping the two parts:

non_empty_parameter_list -> parameter
non_empty_parameter_list -> parameter ',' non_empty_parameter_list

So now it should be clear that regular expressions can match any context-free language (and thus pretty much all languages which programmers are confronted with). Only problem is: Even though regular expressions can match context-free languages nicely, they can’t usually parse them. Parsing means converting some string into an abstract syntax tree. This is not possible using regular expressions, at least not with PCRE (sure, in Perl where you can embed arbitrary code into a regex you can do pretty much everything…).

Still, the above DEFINE based regex definition has proven to be very useful to me. Usually you don’t need full parsing support, but want to just match (e.g. email addresses) or extract small pieces of data (not the whole parse tree). Most complex string processing problems can be made much simpler using grammar based regexes :)

At this point, let me point out again what I already quickly mentioned earlier: Well-formed HTML is context-free. So you can match it using regular expressions, contrary to popular opinion. But don’t forget two things: Firstly, most HTML you see in the wild is not well-formed (usually not even close to it). And secondly, just because you can, doesn’t mean that you should. You could write your software in Brainfuck, still for some reason you don’t.

My opinion on the topic is: Whenever you need generic HTML processing, use a DOM library of your choice. It’ll gracefully handle malformed HTML and take the burden of parsing from you. On the other hand if you are dealing with specific situations a quick regular expression is often the way to go. And I have to admit: Even though I often tell people to not parse HTML with regular expressions I do it myself notoriously often. Simply because in most cases I deal with specific and contained situations in which using regex is just simpler.

Context-sensitive grammars

Now that we covered context-free languages extensively, let’s move up one step in the Chomsky hierarchy: Context-sensitive languages.

In a context-sensitive language all production rules have the following form:

αAβ → αγβ

This mix of characters might start to look more complicated, but it is actually quite simple. At it’s core you still have the pattern A → γ, which was how we defined context-free grammars. The new thing now is that you additionally have α and β on both sides. Those two form the context (which also gives this grammar class the name). So basically A can now only be replaced with γ if it has α to its left and β to its right.

To make this more clear, try to interpret the following rules:

a b A -> a b c
a B c -> a Q H c
H B -> H C

The English translations would be:

Replace `A` with `c`, but only if it has `a b` to its left.
Replace `B` with `Q H`, but only if it has `a` to its left and `c` to its right.
Replace `B` with `C`, but only if it has `H` to its left.

Context-sensitive languages are something that you will rarely encounter during “normal” programming. They are mostly important in the context of natural language processing (as natural languages are clearly not context-free. Words have different meaning depending on context). But even in natural language processing people usually work with so called “mildly context-sensitive languages”, as they are sufficient for modeling the language but can be parsed much faster.

To understand just how powerful context-sensitive grammars are let’s look at another grammar class, which has the exact same expressive power as the context-sensitive ones: Non-contracting grammars.

With non-contracting grammars every production rule has the form α -> β where both α and β are arbitrary symbol strings with just one restriction: The number of symbols on the right hand side is not less than on the left hand side. Formally this is expressed in the formula |α| <= |β| where |x| denotes the length of the symbol string.

So non-contracting grammars allow rules of any form as long as they don’t shorten the input. E.g. A B C -> H Q would be an invalid rule as the left hand side has three symbols and the right hand side only two. Thus this rule would be shortening (or “contracting”). The reverse rule H Q -> A B C on the other hand would be valid, as the right side has more symbols than the left, thus being lengthening.

This equivalence relationship of context-sensitive grammars and non-contracting grammars should make pretty clear that you can match near-everything with a context-sensitive grammar. Just don’t shorten :)

To get an impression of why both grammar kinds have the same expressive power look at the following transformation example:

// the non-contracting grammar
A B -> C D
// can be transformed to the following context-sensitive grammar
A B -> A X
A X -> Y X
Y X -> Y D
Y D -> C D

Anyways, back to regular expressions. Can they match context-sensitive languages too?

This time I can’t gave you definite answer. They certainly can match some context-sensitive languages, but I don’t know whether they can match all of them.

An example of a context-sensitive language that can be easily matched using regex is a modification of the context-free language {a^n b^n, n>0} mentioned above. When you change it into {a^n b^n c^n, n>0}, i.e. some number of as followed by the same number of bs and cs, it becomes context-sensitive.

The PCRE regex for this language is this:


If you ignore the (?=...) assertion for now you’re left with a+(b(?-1)?c). This checks that there is an arbitrary number of as, followed by the same number of bs and cs. The (?-1) is a relative subpattern reference and means “the last defined subpattern”, which is (b(?-1)?c) in this case.

The new thing now is the (?=...) which is a so called zero-width lookahead assertion. It checks that the following text matches the pattern, but it does not actually consume the text. Thus the text is basically checked against both patterns at the same time. The a+(b(?-1)?c) part verifies that the number of bs and cs is the same and the (a(?-1)?b)c part checks that the number of as and bs is the same. Both pattern together thus ensure that the number of all three characters is the same.

In the above regex you can already see how the concept of “context” is realized in regular expressions: Using assertions. If we get back to the definition of a context-sensitive grammar, you could now say that a production rule of type

αAβ → αγβ

can be converted into the following regex DEFINE rule:

(?<A> (?<= α ) γ (?= β ) )

This would then say that A is γ, but only if it has α to its left and β to its right.

Now, the above might look as if you can easily convert a context-sensitive grammar into a regular expression, but its not actually true. The reason is that lookbehind assertions ((?<= ... )) have one very significant limitation: They have to be fixed-width. This means that the length of the text matched by the assertion has to known in advance. E.g. you can write (?<= a(bc|cd) ), but you can’t write (?<= ab+). In the first case the assertion matches exactly three characters in any case, thus being fixed-width. In the second case on the other hand the assertion could match ab, abb, abbb etc. All of those have different lengths. Thus the engine can’t know when it should start to match them and as such they are simply disallowed.

This pretty much blows the easy conversion of context-sensitive grammars to regex. Pretty much all such grammars require variable-width lookbehind assertions.

But the fact that there is no direct context-sensitive grammar to regex conversion doesn’t by itself mean that regular expressions can’t match all of them. E.g. the above {a^n b^n c^n, n>0} language also has a grammar that would require variable-width lookbehind assertions. But we can still avoid using them as regex isn’t bound to specifying rules in a grammar. Maybe the same is possible for all other context-sensitive grammars too. I honestly don’t know.

So, what can we say here? Regex can match at least some context-sensitive languages, but it’s unknown whether it can match all of them.

Unrestricted grammars

The next grammar class in the Chomsky hierarchy are the unrestricted grammars. The language set which one can form using them is the set of all recursively enumerable languages.

There is little to say about unrestricted grammars as they are, well, unrestricted. Production rules for unrestricted grammars have the form α -> β, where α and β are symbol strings with no restrictions whatsoever.

So basically unrestricted grammars remove the “non-contracting” part of the non-contracting grammars. Thus for them A B C -> H Q would be a valid rule, even though previously it wasn’t.

How powerful are unrestricted grammars exactly? They are as powerful as it gets: They are Turing-complete. There even is a “programming language” which is based on unrestricted grammars: Thue. As it is Turing-complete it can do everything that other languages can do.

One implication of being Turing-complete is that checking whether a certain string adheres to some grammar is undecidable for the general case.

Sadly I can’t say anything whatsoever about how regular expressions and unrestricted grammars relate. Heck, I couldn’t even find an example of a meaningful unrestricted grammar (that wasn’t non-contracting).

But now that we started talking about Turing-completeness we get to another point:

Regular expressions with backreferences are NP-complete

There is another very powerful regular expression feature that I did not mentioned previously: backreferences.

E.g. consider this very simple regex:


(.+) matches some arbitrary text and \1 matches the same text. In general \n means “whatever the nth subpattern matched”. E.g. if (.+) matched foo, then \1 will also match only foo and nothing else. Thus the expression (.+)\1 means “Some text followed by a copy of itself”.

What this simple regex matches is called the “copy language” and is another typical example of a context-sensitive language.

Similarly you can match the other example grammars from above using backreferences:

# {a^n b^n, n>0} (context-free)
/^ (?: a (?= a* (\1?+ b) ) )+ \1 $/x
# {a^n b^n c^n, n>0} (context-sensitive)
/^ (?: a (?= a* (\1?+ b) b* (\2?+ c) ) )+ \1 \2 $/x

Explaining how these work is outside the scope of this article, but you can read an excellent explanation on StackOverflow.

As you can see, the mere addition of backreference (without subpattern recursion support) already adds a lot of power to regular expressions. The addition is actually so powerful that it makes matching of regular expressions an NP-complete problem.

What does NP-complete mean? NP-complete is a computational complexity class for decision problems in which many “hard” problems fall. Some examples of NP-complete problems are the traveling salesman problem (TSP), the boolean satisfiability problem (SAT) and the knapsack problem (BKP).

One of the main conditions for a problem being NP-complete is that every other NP problem is reducible to it. Thus all NP-complete problems are basically interchangeable. If you find a fast solution to one of them, you got a fast solution to all of them.

So if somebody found a fast solution to a NP-complete problem, pretty much all of the computationally hard problems of humanity would be solved all in one strike. This would mean the end to civilisation as we know.

To prove that regular expressions with backreferences are indeed NP-complete one can simply take one of the known NP-complete problems and prove that it can be solved using regular expressions. As an example I choose the 3-CNF SAT problem:

3-CNF SAT stands for “3-conjunctive normal form boolean satisfiability problem” and is quite easy to understand. You get a boolean formula of the following form:

 (!$a || $b || $d)
&& ( $a || !$c || $d)
&& ( $a || !$b || !$d)
&& ( $b || !$c || !$d)
&& (!$a || $c || !$d)
&& ( $a || $b || $c)
&& (!$a || !$b || !$c)

Thus the boolean formula is made up of a number of clauses separated by ANDs. Each of those clauses consists of three variables (or their negations) separated by ORs. The 3-CNF SAT problem now asks whether there exists a solution to the given boolean formula (such that it is true).

The above boolean formula can be converted to the following regular expression:

$regex = '/^
 (x?)(x?)(x?)(x?) .* ;
 (?: x\1 | \2 | \4 ),
 (?: \1 | x\3 | \4 ),
 (?: \1 | x\2 | x\4 ),
 (?: \2 | x\3 | x\4 ),
 (?: x\1 | \3 | x\4 ),
 (?: \1 | \2 | \3 ),
 (?: x\1 | x\2 | x\3 ),
$string = 'xxxx;x,x,x,x,x,x,x,';
var_dump(preg_match($regex, $string, $matches));

If you run this code you’ll get the following $matches result:

array(5) {
 [0]=> string(19) "xxxx;x,x,x,x,x,x,x,"
 [1]=> string(1) "x"
 [2]=> string(1) "x"
 [3]=> string(0) ""
 [4]=> string(0) ""

This means that the above formula is satisfied if $a = true, $b = true, $c = false and $d = false.

The regular expression works with a very simple trick: For ever 3-clause the string contains a x, which has to be matched. So if you have something like (?: \1 | x\3 | \4 ), in the regex, then the string can be only matched if either \1 is x (true), \3 is the empty string (false) or \4 is x (true).

The rest is left up to the engine. It’ll try out different ways of matching the string until it either finds a solution or has to give up.

Wrapping up

As the article was quite long, here a summary of the main points:

  • The “regular expressions” used by programmers have very little in common with the original notion of regularity in the context of formal language theory.
  • Regular expressions (at least PCRE) can match all context-free languages. As such they can also match well-formed HTML and pretty much all other programming languages.
  • Regular expressions can match at least some context-sensitive languages.
  • Matching of regular expressions is NP-complete. As such you can solve any other NP problem using regular expressions.

But don’t forget: Just because you can, doesn’t mean that you should. Processing HTML with regular expressions is a really bad idea in some cases. In other cases it’s probably the best thing to do.

Just check what the easiest solution to your particular problem is and use it. If you choose to solve a problem using regular expressions, don’t forget about the x modifier, which allows you to nicely format your regex. For complex regular expressions also don’t forget to make use of DEFINE assertions and named subpatterns to keep your code clean and readable.

That’s it.

Read the whole story
3483 days ago
Share this story
Next Page of Stories