It's true. It turns out you can, in fact, add C-style for loops into Python. The way to get there however, was long and painful all the way to the end.
Regardless, I've learned many things (some of which I hope nobody ever uses in production), and I'm here to share with you all the gruesome details. I hope you find it helpful (or at the very least, entertaining).
The Path to get there
I had to try 3 different approaches in order to finally get to the image that you see at the top. And hence, I've sectioned the blog into 3 parts:
- Part 1: The "simple" implementation
- Part 2: Making our own language inside Python
- Part 3: The "Truly Cursed" way
But before all of that, let's answer the big question: "why?".
Part 0: The beginning
In the beginning, I had a dumb idea:
"Can I possibly put the 3-statement for-loop from C into Python?"
I mocked up an idea that I thought I could make, and posted it on twitter:
... and immediately, I got some intriguing reactions:
So I decided it'd be worthwhile to try and make this a reality. Simple as that.
Part 1: The "simple" implementation
I wanted it to look like a regular for-loop from C as much as possible:
for (int i = 0; i < 10; i++) {
printf("%d", i);
}
But there's a couple big problems:
-
The builtin
for
syntax requires you to have anin
clause:for <variable> in <some expression that gives an iterable>: ...
Which is pretty restrictive. So our second best bet is to implement a function instead -- one that takes in three arguments. I went with
_for(x, y, z)
.For the
for
to be a block, I needed a generic block kind of statement, one that allows us to put arbitrary code inside it: so I chose awith
block:with
blocks can be customized with__enter__
and__exit__
. -
The second problem is much bigger: The original
for
loop syntax takes in statements. Notice thatint i = 0
andi++
are statements, ones which declare and assign a variable respectively.Whereas, Python functions take in "expressions", things that return a value. This one is much harder to hack in, but I have one idea to save the day:
We need to do assignment, but we are limited to expressions. Hmmm... where have I heard of this before... oh right. Assignment expressions.
Python 3.8 has come to save the day. Let's try it out.
Hence, I started with this rough idea:
with _for(i := var(0), i < 10, i + 2):
print(i)
Yeah, it's not the greatest syntax of all time, but you have to be confined to Python syntax when writing Python (... or so I thought, more on that later).
The first argument i := var(0)
is what will initialise our i
variable. But,
it is no ordinary variable, mind you. For the second and third argument to work,
i < 10
and i + 2
should behave as the condition and increment part of the
for loop. Thankfully, By overriding functions like __lt__
and __add__
, we
can customize how this i
variable behaves when we do these comparisons and
additions.
Now, all that's left is to implement this _for
and var
thing.
Implementing var
We want var
to be able to do these 3 things at least:
- Support storing a value inside it when initializing, i.e.
var(0)
. - Support creating conditions with comparisons:
i < 10
should return a "conditional" type, which we will use later. - Support creating the "increment", i.e.
i + 2
should return an "increment" type, which we will also use later.
So I started with a simple var
class:
class var:
def __init__(self, value) -> None:
self.value = value
def __repr__(self) -> str:
return repr(self.value)
i = var(0)
print(i) # prints 0
I then added comparison support, by making a _Comparison
type, and
implementing __lt__
, such that doing i < something
stores that information:
class _Comparison:
def __init__(self, var, op, value):
self.var = var
self.op = op
self.value = value
def __repr__(self) -> str:
return f"Comparing var({self.var!r}) {self.op} {self.value!r}"
class var:
def __init__(self, value):
self.value = value
def __repr__(self) -> str:
return repr(self.value)
def __lt__(self, value):
return _Comparison(var=self, op="<", value=value)
Pretty simple so far:
>>> i = var(0)
>>> print(i)
0
>>> comp = i < 10
>>> print(comp)
Comparing var(i) < 10
>>> type(comp)
<class '__main__._Comparison'>
>>> comp.op
'<'
>>> comp.value
10
This _Comparison
objects stores all information about the comparison, and this
gets passed down to the _for()
function.
We'll do the same for the increment part:
class _Increment:
def __init__(self, var, op, value):
self.var = var
self.op = op
self.value = value
def __repr__(self) -> str:
return f"Incrementing var({self.var!r}) with {self.op}{self.value!r}"
class var:
[...]
def __add__(self, value):
return _Increment(var=self, op="+", value=value)
>>> x = var(10)
>>> x + 2
Incrementing var(10) with +2
Alright! Now to the next part.
Implementing _for
_for()
must return a context manager, because we're using it with a with
statement, so we'll use the contextlib.contextmanager
decorator to
quickly make one:
from contextlib import contextmanager
@contextmanager
def _for(variable, comparison, increment):
print(f"We initialized the variable as {variable}.")
print(
f"We'll increment the value by {increment.op}{increment.value} "
f"each time, until it ~~stays~~ {comparison.op} {comparison.value}."
)
yield
with _for(i := var(0), i < 10, i + 2):
...
And this way, we should have all the information needed:
$ python cursedfor.py
We initialized the variable as 0.
We'll increment the value by +2 each time, until it stays < 10.
Let's do the looping now, shall we?
from contextlib import contextmanager
@contextmanager
def _for(variable, comparison, increment):
value = variable.value
while value < comparison.value:
print(value)
value += increment.value
yield
with _for(i := var(0), i < 10, i + 2):
...
And the output:
$ python cursedfor.py
0
2
4
6
8
It works! Let's try a couple other cases:
>>> with _for(i := var(1), i < 10, i + 3):
... print(f"{i = })
...
1
4
7
i = 1
Great! ... wait a second.
Generalizing
We're not actually executing the body of the for loop as many times, we're
simply printing out value
, which is hardcoded. The body is still only executed
once. That's not what we wanted at all!
Let's try to hack around this.
Stack manipulation
What now? Does the with
statement create its own stack, so we can, in theory,
find the body of the with
statement and execute that N times?
Let's check using the dis
module:
>>> import dis
>>> dis.dis('''
... with x:
... pass
... ''')
2 0 LOAD_NAME 0 (x)
2 SETUP_WITH 16 (to 20)
4 POP_TOP
3 6 POP_BLOCK
8 LOAD_CONST 0 (None)
10 DUP_TOP
12 DUP_TOP
14 CALL_FUNCTION 3
16 POP_TOP
18 JUMP_FORWARD 16 (to 36)
>> 20 WITH_EXCEPT_START
22 POP_JUMP_IF_TRUE 26
24 RERAISE
>> 26 POP_TOP
28 POP_TOP
30 POP_TOP
32 POP_EXCEPT
34 POP_TOP
>> 36 LOAD_CONST 0 (None)
38 RETURN_VALUE
Nope. Since there's only one block of output, that shows there's no separate
code object for a with
statement's body. Compare this to a function:
>>> dis.dis('''
... def foo():
... pass
...
... foo()
... ''')
2 0 LOAD_CONST 0 (<code object foo at 0x7f318f54bf50, file "<dis>", line 2>)
2 LOAD_CONST 1 ('foo')
4 MAKE_FUNCTION 0
6 STORE_NAME 0 (foo)
5 8 LOAD_NAME 0 (foo)
10 CALL_FUNCTION 0
12 POP_TOP
14 LOAD_CONST 2 (None)
16 RETURN_VALUE
Disassembly of <code object foo at 0x7f318f54bf50, file "<dis>", line 2>:
3 0 LOAD_CONST 0 (None)
2 RETURN_VALUE
You can see there's two disassemblies, of two separate code objects. So that's a no-go.
But hey, we can still access the stack frame...
>>> import inspect
>>> frame = inspect.currentframe()
And I could read the code object inside it...
>>> frame.f_code
<code object <module> at 0x7f03e9d90f50, file "<stdin>", line 1>
>>> frame.f_code.co_code
b'e\x00\xa0\x01\xa1\x00Z\x02d\x00S\x00'
>>> frame.f_lasti
10
f_lasti
also tells us which of those bytes (instructions) was last executed,
which means the code after it must contain the body of the with
statement!
So if we could just extract the bytecode and then run the body manually N times, we could...
Giving up
Here's one (obvious) piece of advice -- if your current approach devolves into you trying to rewrite the Python interpreter, you have messed up.
Let's backtrack, and try to think of some other possibility.
Part 2: Making our own language inside Python
Thanks to another such project that I made in the past, I learned the fact that you can add arbitrary semantics into Python -- essentially, if something is valid Python code, you could transform it to suit your needs. All you need is this one simple trick: AST manipulation.
Here's what I mean. You can take any piece of code:
>>> code = '2 + 3'
You can parse it (as long as it's valid "Syntax")
>>> import ast
>>> tree = ast.parse(code)
>>> print(ast.dump(tree, indent=2))
Module(
body=[
Expr(
value=BinOp(
left=Constant(value=2),
op=Add(),
right=Constant(value=3)))],
type_ignores=[])
You can change what the code actually means:
# Substituting addition with subtraction
>>> tree.body[0].value.op = ast.Sub()
And you can generate code back out of it:
>>> ast.unparse(tree)
'2 - 3'
Which you're free to execute as you please. This example is really simplified, but with enough transformation, you can turn any Python code into anything else.
So let's do exactly that.
Building a for-loop transformer
What we want the following code:
with _for(i := 0, i < 10, i + 2):
<the body>
into the following:
i = 0
while i < 10:
<the body>
i += 2
We can break this down into three distinct parts:
- Find all
with _for()
statements inside a code block - Transform the
with _for()
into the initializer andwhile
statements. - Replace all the instances with the two statements instead.
Let's do them in that order:
Find the cursed loops
We'll be using an ast.NodeTransformer
class which helps us find and replace
AST nodes inside the code easily. We'll be implementing its generic_visit()
method, to find all AST nodes that have a body
, and find cursed _for
blocks
inside them.
class CursedForTransformer(ast.NodeTransformer):
def generic_visit(self, node: ast.AST) -> ast.AST:
super().generic_visit(node)
if hasattr(node, "body") and isinstance(node.body, list):
new_body = []
for stmt in node.body:
if isinstance(stmt, ast.With):
if any(
self.is_cursed_for_call(expr.context_expr)
for expr in stmt.items
):
item_replacements = self.replace_cursed_for(stmt)
new_body.extend(item_replacements)
continue
# If it wasn't a cursed `with` node, add it to body.
new_body.append(stmt)
node.body = new_body
return node
If you want to deep dive into what's going on here with
NodeTransformer
,generic_visit
etc., check out this post.
Essentially:
-
We're finding all blocks in our code, by finding
body
attributes:if hasattr(node, "body") and isinstance(node.body, list):
-
Then we're trying to find
with
statements, which contain the cursed for-loops:if isinstance(stmt, ast.With): if any( self.is_cursed_for_call(expr.context_expr) for expr in stmt.items ):
-
And if found, we call
self.replace_cursed_for(stmt)
to replace it with the initializer and while loop, in thenew_body
:item_replacements = self.replace_cursed_for(stmt) new_body.extend(item_replacements) continue
-
Otherwise, if it's not a cursed for loop, we just add the node to the body, unchanged:
# If it wasn't a cursed `with` node, add it to body. new_body.append(stmt)
And we update the node's body, and return.
Here's the code for self.is_cursed_for_call
, it just checks if the function is
called _for
:
@staticmethod
def is_cursed_for_call(node: ast.AST) -> bool:
return (
isinstance(node, ast.Call)
and isinstance(node.func, ast.Name)
and node.func.id == "_for"
)
Implementing the cursed-for transformer
We have yet to see how self.replace_cursed_for(stmt)
works. In essence it just
has to return a list of two nodes:
- An initializer (like
i = 0
), - And a
while
loop, containing the condition and thewith
body.
Here's the function, simplified and documented for explanation:
def replace_cursed_for(self, node: ast.With) -> list[ast.AST]:
# Get the `_for(init, condition, increment)` node
cursed_for_call: ast.Call = node.items[0].context_expr
# Extract `init`, `condition` and `increment`
init_node, condition_node, increment_node = cursed_for_call.args
# Turn the init expression into a statement, i.e. from `i := 0` to `i = 0`
init_variable: ast.Name = init_node.target
init_statement = ast.Assign(targets=[init_variable], value=init_node.value)
# Turn the increment expression into a statement, i.e. `i + 2` to `i += 2`
increment_statement = ast.AugAssign(
target=init_variable,
op=increment_node.op,
value=increment_node.right,
)
# Add the increment statement at the end of the while loop's body
block_body = [*node.body, increment_statement]
while_statement = ast.While(test=condition_node, body=block_body, orelse=[])
# Return the initializer and while loop
return [init_statement, while_statement]
To show it in action, here's the transformation it does:
>>> tree = ast.parse('''
... def foo():
... with _for(i := 10, i <= 0, i - 3):
... print("The value is:", i)
... ''')
>>> modified_tree = CursedForTransformer().visit(tree)
>>> print(ast.unparse(modified_tree))
>>> ast.fix_missing_locations(modified_tree)
<ast.Module object at 0x7f7a18bebf70>
>>> print(ast.unparse(modified_tree))
def foo():
i = 10
while i <= 0:
print('The value is:', i)
i -= 3
It works! We can finally use this to run our cursed for loops!
If you wish to try this out yourself, the complete code (with much better error handling) is present in cursedfor.py in the GitHub repository, and you can download it and run the REPL.
How the cursed REPL works
Python's standard library never ceases to amaze me. Even for a usecase as weird
as creating a custom REPL, Python lets you do that by itself. Specifically,
there exists a code
module, which contains primitives such as
code.InteractiveConsole
.
InteractiveConsole
is a pre-built Python REPL, which by default works the same
way you would expect Python's interactive console to work:
import code
class CursedConsole(code.InteractiveConsole):
pass
CursedConsole().interact(banner=f"Cursed Python REPL", exitmsg="")
$ python cursedfor.py
Cursed Python REPL
>>> x = 5
>>> x
5
>>>
Completely normal.
But, you can override any of its functions to change their functionality.
In our case, we can change the runsource
method, which is responsible for
running a single statement or expression at a time. Here's the full
implementation:
class CursedConsole(code.InteractiveConsole):
def runsource(
self,
source: str,
filename: str = "<input>",
symbol: str = "single",
) -> bool:
# First, check if it could be incomplete input, return True if it is.
# This will allow it to keep taking input
with suppress(SyntaxError, OverflowError):
if code.compile_command(source) == None:
return True
try:
tree = ast.parse(source, filename, mode=symbol)
CursedForTransformer().visit(tree)
ast.fix_missing_locations(tree)
except (ValueError, SyntaxError):
# Let the original implementation take care of incomplete input / errors
return super().runsource(source, filename, symbol)
code_obj = compile(tree, filename, mode=symbol)
self.runcode(code_obj)
return False
The code.compile_command
part returning True
is important to support
multiline statements in the REPL, like these:
>>> def foo():
... print(42)
...
>>>
Support for multiline input and exceptions comes for free because we used this class.
Apart from that, the interesting part is this:
try:
tree = ast.parse(source, filename, mode=symbol)
CursedForTransformer().visit(tree)
except (ValueError, SyntaxError):
[...]
code_obj = compile(tree, filename, mode=symbol)
self.runcode(code_obj)
We simply modified the AST tree before passing it on to self.runcode
.
This is actually amazing. Using this same class and a custom NodeTransformer
,
you can build basically any DSL or mini language that uses Python's syntax.
If you want to try doing it yourself, the starter code is present here.
Is that it?
I was finally done with this project, but it always stayed in the back of my mind. I just felt that I hadn't done it justice.
And so, a couple months later, I got to know of a much, much better way to do this. And it was everything that I could've ever wanted.
Part 3: The "Truly Cursed" way
I stumbled upon this library, which tries to add braces to Python instead
of indentation. And although the implementation is questionable, it gave me
exactly the tool I needed to finish this project: codecs
.
What the heck is a codec?
A "codec" in Python refers to the tools that let you convert the text encoding of a source file into a Python string, and (in case of source code) can run the code.
A lot of pre-built codecs exist, such as utf-8
, cp1252
, and ascii
, so that
people can write their code in whichever file encoding they prefer to use, and
Python will pre-process the file into the text format that it understands,
before trying to run it.
There's just one tiny detail: You can write your own custom codecs.
Source translation
So here's the new idea: What if instead of AST manipulation, we manipulated the source text directly before it runs?
Yeah, I know, it's probably a bad idea. But I think it can work. And you can't stop me, so you might as well see what I have to say.
The approach in my mind is pretty simple:
- Find the
for (init; condition; increment):
pattern in the source code - Substitute that text with two lines:
init
andwhile condition:
- Append the
increment
statement at the end of thefor
body.
Let's try it out then.
Time for regex hacks
To create the encoding, we first need to implement a transformation function. I chose to implement one that works on a list of Python source lines. Here's the boilerplate:
import codecs
def _transform_cursed_for(lines: list[str]) -> list[str]:
new_source = []
index = 0
while index < len(lines):
line = lines[index]
# Do stuff with the line
new_source.append(line)
index += 1
return new_source
def transform_cursed_for(source: str) -> str:
lines = source.splitlines()
new_lines = _transform_cursed_for(lines)
return "\n".join(new_lines)
codecs.register(
{'cursed_for': codecs.CodecInfo(
name='cursed_for',
encode=utf_8.encode,
decode=transform_cursed_for,
)}.get
)
Now for the details on how we match and replace for (x; y; z):
lines:
-
First, you match the syntax:
beginning_with_for_regex = re.compile(r'^\s*for\b') if beginning_with_for_regex.match(line): ...
-
If we find such a line, we find the three parts and the indentation of the inner block:
indent_regex = re.compile(r'^\s*') cursed_for_regex = re.compile(r'^\s*for\s*\((.+?);(.+?);(.+?)\):(.*)$') match = cursed_for_regex.match(line) initializer, condition, increment = match[1], match[2], match[3] # The indentation of the `for ():` line for_indent_level = indent_regex.match(line).group() # Read the next line next_line = lines[index] index += 1 # The indentation of the inner block body_indent_level = indent_regex.match(next_line).group()
-
Now, we collect all the lines which are inside the
for
block, by finding all lines which are indented at the same level as the first line:for_body_lines = [next_line] while index < len(lines): next_line = lines[index] if not next_line.startswith(body_indent_level): break for_body_lines.append(next_line) index += 1
-
At the end, we add in the two initializer and
while
statemnets:initializer_stmt = f'{for_indent_level}{initializer.strip()}' while_stmt = f'{for_indent_level}while {condition.strip()}:' increment_stmt = f'{body_indent_level}{increment.strip()}' new_source.append(initializer_stmt) new_source.append(while_stmt) new_source.extend(for_body_lines) new_source.append(increment_stmt)
So that's how I did it. And to my own surprise, it works!
code = b'''
for (i = 0; i < 10; i += 2):
print(i)
print("done")
'''
print(code.decode('cursed_for'))
$ python cursedfor.py
i = 0
while i < 10:
print(i)
i += 2
print("done")
Is this useful, at all? Probably not. Is it cursed? YES.
Can this knowledge be useful in making actually good Python packages? You tell me!
Nested cursed loops
The current method does work, but it breaks if you try to give it nested loops:
for (i = 0; i < 10; i += 3):
for (j = i; j < 10; j += 3):
print(i, j)
This is because we don't transform the body of the for-loop, which we should.
Recursively.
Simply call _transform_cursed_for()
on the collected lines of the inner body,
before appending it to the new body:
[...]
new_source.extend(_transform_cursed_for(for_body_lines))
And that's it!
The complete code for this approach, with much better error handling, etc. is present in this folder in the GitHub repository.
The making of perhaps the most cursed Python package
There's just one more thing left to do: Make this an installable package.
And honestly, that part of the process is a lot more black-magic hackery than what we've even seen so far. I basically copied what this other project called future-fstrings does, and you can watch this video if you're really curious.
In essence, we need to register this encoding at Python's startup time. And we
do that by creating a .pth
file which must be stored into the site-packages
folder, which will get auto imported when Python starts up.
And with that, our package is complete.
Test drive
You can now install the package:
pip install cursed-for
And run your cursed python files with a # coding
comment at the top:
# coding: cursed-for
for (i = 5; i < 10; i += 2):
print(i)
$ python x.py
5
7
9
Can you do this in a REPL? Can you debug the output? Absolutely! Check out the repository's README for more info.
And that's it from me. I... hope you learned something useful? I sure did.