the trick about doing an equivalent of switch() at the end of every instruction to facilitate better branch prediction was totally mind-blowing.
- to implement VMs, python has a giant eval loop (ceval.c)
- optimization 1: remove set_lineno used for tracing, use a structure where we can back out actual line numbers only when it's needed
- optimization 2: use non-standard gotos; GCC has a extension called labels as values. Using this we can directly branch into next instruction -- instead of continuing the beginning of the ceval loop; this is not really different from going back to select(), but it is more predictable because of the normal opcode patterns -- makes branch prediction better and gives a freaking 17% speed improvement!! holy shit this is freaking awesome
- optimization 3: Unladen Swallow - LLVM
- opt 4: timsort
- I know about this! :-)
- adaptive sorting = use the existing sorted-ness
- Objects/listsort.txt
- 10x faster for mostly-sorted data
- opt 5: dict
- I know this stuff as well -_-;
- the importance of dicts: global variable 1 dict lookup, built-in function 2 lookups, object attribute 1+, object method 2+ (usually), dict-like 1+
- hashtable - closed hashing; resize when 2/3 full
- previously, it was locals? globals? builtins? raise NameError
- fastlocals: tokenizer gives each local var a slot in a list
- changes observable behaviors
- micro-opt 1: statistical dict comparison reordering
- probing code rewrite; replace polynomial
- analysing hash entry; it was four ifs
- do a stats and reordering comparisons so the unlikely will go in later
- micro-opt 2: dict 2
- Objects/dictnotes.txt and one line patch
- size * 2 => size * (size > 50000 ? 2 : 4)
- micro-opt 3: str.repeat
- "ab" * 5 = "ababababab"
- "x" * 1000000 is way slower than "x" * 1000 * 1000
- divide and conquer!!!
- copy itself! => O(lgN) loops
- Objects/*.tt
(정리 안하고 들으면서 쓴거 그냥 붙여넣기)


