
This has one flaw, which is that if the string ends in NUL characters ( '\0'/ \x00') you'll lose them (switching to 'big' byte order would lose them from the front). Recoveredbytes = myint.to_bytes((myint.bit_length() + 7) // 8, 'little') Myint = int.from_bytes(mybytes, 'little') to_bytes on the resulting int, then decode back to str: mystring = "Welcome to the InterStar cafe, serving you since 2412!" see what additional baggage that strictly-OOP solution gives you? i believe it can be left out of the picture most of the time.Encode it to a bytes in a fixed encoding, then convert the bytes to an int with int.from_bytes. can hardly be compressed below four lines. class Phone():ĭef _init_ ( self, input ): self.phone = self._sanitize( input )ĭef _sanitize ( self, input ): return input.replace( ' ', '' ).replace( '(', '' ).replace( ')', '' )
#Python convert string to int efficiently code#
two lines of code if need be, whereas the OP’s code.

legal_phone_nr_chrs = set( '0123456789#+' )ĭef sanitize_phone_nr( phone_nr ): return ''.join( chr for chr in phone_nr if chr in legal_phone_nr_chrs )

now look, the last solution does fairly much what the OP managed to get done (and more), in. programming is about getting things understood and done, not about programmers writing code that approaches the spatial efficiency of gzip. chr for character, nr for number and R for the return value (more likely to be, ugh, retval where used in the standard library) are in my style book. Notice a few points here: i strive for clarity, which is why i try to avoid over-using abbreviations. however, performance would in any case be dependent on the expected usage pattern (i am sure you truncate your phone nrs first thing, right? so those would be many small strings to be processed, not few big ones). the disadvantage of this solution would be that you iterate over the input characters from within Python, not making use of the potentially speeder C traversal as offered by str.replace() or even a regular expression. That last stanza could well be written on a single line. Return illegal_phone_nr_chrs_re.sub( '', phone_nr ) Illegal_phone_nr_chrs_re = _new_regex( r"" ) and you can do that with a regular expression. So what you want is really not to dis-allow specific characters (there are about a hundred thousand defined codepoints in unicode 5.1, so how do catch up with those?), but to allow those very characters that are deemed legal in dial strings. why not re-write the sanitizing method to something very generic without becoming more complex? after all, how can you be sure your users never input other deviant characters in that web form field? this can be demonstrated by the fact that at least in mobile nets, + and # and maybe more are valid characters in a dial string (dial, string-see?).īut apart from that, sanitizing a user input phone nr to get out a normalized and safe representation is a very, very valid concern-only i feel that your methodology is too specific. In this special application, of course, what you really want to do is just cancelling out any unwanted characters, so you can simplify this: probes = ' ()'Ĭoming to think of it, it is not quite clear to me why you want to turn a phone nr into an integer-that is simply the wrong data type. in order to make it a little denser and more parametrized, consider changing it to phone_nr_translations = [įor probe, replacement in phone_nr_translations: So you say tamaytoes, and i say tomahtoes: the original solution is quite good in terms of clarity and genericity. to use translate() here is IMHO just the wrong tool, and nowhere as conceptually simple and generic as the original replacement chain.

I recommend against using regular expressions where not inevitable they just add conceptual overhead and a speed penalty otherwise. after all, the OP complained about the original replacement chain being too ‘clumsy’, not too ‘slow’. SilentGhost: dis.dis does demonstrate underlying conceptual / executional complexity.

On a side note, store your phone numbers as strings to deal with leading zeros, and use a phone formatter where needed. The original replace method will add 6 additional instructions per replacement, while all of the others will stay constant. regex is slightly less efficient, but more compatible (which I see a requirement for you). The translate method will be the most efficient, though relies on py2.6+. Print (TEST_PHONE_NUM).translate(None,' ()') The following is a test program to demonstrate this using the dis module (See Doug Hellman's PyMOTW on the module here for more detailed info). The suggestion made by ChristopheD will work just fine, but is not as efficient. > re.sub('', '', num) # Match all non-digits (), replace them with empty string, where found in the `num` variable. How about just using regular expressions?
