Posts mit dem Label tokenize werden angezeigt. Alle Posts anzeigen
Posts mit dem Label tokenize werden angezeigt. Alle Posts anzeigen

Mittwoch, 8. Mai 2013

C++: improved string tokenizer

Using templates the old string tokenzier can be further improved:

template  void tokenize(const std::string &str,
              T &tokens,
              const std::string &delimiters = " ") {
  string::size_type lastPos = str.find_first_not_of(delimiters, 0);
  string::size_type pos = str.find_first_of(delimiters, lastPos);

  while (std::string::npos != pos || std::string::npos != lastPos) {
    tokens.push_back(str.substr(lastPos, pos - lastPos));
    lastPos = str.find_first_not_of(delimiters, pos);
    pos = str.find_first_of(delimiters, lastPos);
  }
}

This allows any container with a push_back() method to receive the extracted tokens.

std::list tokens;
tokenize("I want to tokenize this!", tokens);

tokenize("2012-02-20", tokens, "-");

or

std::vector tokens;
tokenize("I want to tokenize this!", tokens);

tokenize("2012-02-20", tokens, "-");

Donnerstag, 23. Februar 2012

C++: string tokenzier

This function splits a string into substrings. It uses a list of delimiter characters to find the boundaries of the substrings.

void tokenize(const std::string &str, 
              std::vector<std::string> &tokens, 
              const std::string &delimiters = " ") {
  string::size_type lastPos = str.find_first_not_of(delimiters, 0);
  string::size_type pos = str.find_first_of(delimiters, lastPos);

  while (std::string::npos != pos || std::string::npos != lastPos) {
    tokens.push_back(str.substr(lastPos, pos - lastPos));
    lastPos = str.find_first_not_of(delimiters, pos);
    pos = str.find_first_of(delimiters, lastPos);
  }
}

Usage:
vector<std::string> tokens;
tokenize("I want to tokenize this!", tokens);

tokenize("2012-02-20", tokens, "-");