Ultimate Rust Crash Course Primitive Types and Control Flow

By: Nathan Stocks

4 minutes

Share the link to this page

Copied

Facebook

Twitter

Add the class to your calendar

Add to Google Calendar

Add to Apple Calendar

Add to Yahoo Calendar

Add to Outlook Calendar

Completed

More than you ever wanted to know about Strings in Rust, so you can stop running into walls and get on with your life.

In this video:

Strings and borrowed string slices
How a string is implemented, high-level
UTF-8
Bytes
Unicode scalars
Graphemes
Iterators (a little bit) and .nth()

Transcript

Strings. I'm going to warn you up front here be dragons. I'll do my best to steer you right. There are at least six types of strings in the rust standard library, but we mostly care about two of them overlap each other. The first is called a string slice and you will almost always see it as a borrowed string slice. We'll talk more about borrowing later.

A literal string is always a borrowed string slice. A borrowed string slice is often referred to as a string, which can be really confusing when you learn that the other string type is a string with a capital S. The biggest difference between the two is that the data in a borrowed string slice cannot be modified while the data in a string can be modified. You will often create a string by calling the to string method on a borrowed string slice or bypassing a borrowed string, slice to string from borrowed string slice is internally made up of a pointer to some bytes and the length. A string is made up of a pointer to some bytes of length and a capacity that may be higher than what is currently being used. In other words, a borrowed string slice is a subset of a string in more ways than one, which is why they share a bunch of other characteristics.

For example, both string types are valid UTF eight, by definition by compiler enforcement and by runtime checks. Also, strings cannot be indexed by character position. Why not? Because English is not the only language in the world. In fact, Google told me that there were over 6900 living languages and emojis on top of that, and they all seem to make their way into Unicode. And strings are Unicode, which means things get complicated.

Let's take a look at the Thai word salad de let's say that we wanted to get this thing What we think should be indexed three, ultimately this string is stored as a vector of 18 bytes. Would we get what we wanted if we indexed in by bytes, not even close Unicode scalars in UTF, eight can be represented by 123 or four bytes, and you have to traverse the bytes in order to tell where one scalar ends, and the next begins. In this case, every three bytes is a Unicode scalar. So if there were a way to index into the scalars, would we get what we want closer, but still off diacritics are Unicode scalars that combine with other Unicode scalars to produce a different grapheme. And the grapheme is usually what we care about. So now you understand that graphemes decomposed into variable amounts of scalars, which decompose into variable amounts of bytes as part Have rusts emphasis on speed indexing operations on standard library collections are always guaranteed to be constant time operations.

You can't do that with strings because the bytes which are indexable aren't guaranteed to be what people want when they index into a string. And the graphemes, which people do want can only be retrieved after slowly examining a sequence of bytes. So when presented with a string, you have some options, you can use the bytes method to access the vector of UTF eight bytes, which you can index into if you want. Since bytes are fixed size, this actually works fine for Simple English text. As long as you stick to the portion that overlaps ASCII, you can use the cares method to retrieve an iterator that you can use to iterate through the Unicode scalars. And finally, you can use a package like Unicode segmentation which provides handy functions that return iterators Handle graphemes of various types.

With each of these approaches, you know that if you can index into something, it will be a fast constant time operation. While if you iterate through something, it is going to process some variable number of bytes during each iteration of the loop. Hopefully you can sidestep most of these issues by using one of the many helper methods created to manipulate strings. But if you do end up manually using one of the iterators iterators have a handy method called nth that you can use in place of indexing. And now you know why you have to pick an iterator and use nth instead of being able to index into a string directly. In the next video, we will talk about ownership.

All Topics

Free
    Live Classes

    Recorded Classes

    Products

    Bundles

    Videos

    Programs
Academics
Business
Creative
Health and Fitness
LifeStyle
Personal Development
Software

Academics

Creative

Health and Fitness

LifeStyle

Personal Development

Software

Admissions

Engineering

Hardware

Hospitality

Humanities

Chinese

Languages

Maths

Other

Pharma

BioPharma

Research

Science

Teaching

Test Preparation

K-12

School

IGCSE

Accounting

Advertising

Analysis

Analytics

Business Communication

Writing

eCommerce

Entrepreneurship

Finance - India

Investing

Freelancing

Internet of Things

Digital Transformation

Human Resources

Industry

Management

Marketing

Media

Operations

Other

Law
Security

Project Management

Public Relations

Real Estate

Sales

Strategy

Audio Editing

Premiere Pro

Audio Production

Dance

Design

Film Production

Music

Photography

Video Production

Writing

Dieting

Food Safety

Games

Chess

Medical

Medical Professionals

Meditation

Pregnancy

Safety & First Aid

Self Defense

Sports

Beauty & Makeup

Food

Fashion

Gaming

Home Improvement

Parenting

Pet Care & Training

Relationships

Sustainable Living

Travel

Career Development

Religion and Spirituality

Accounting

Amazon Web Services

App Development

Continuous Integration

Backup Software

Business Automation

Computational Fluid Dynamics

Business Intelligence

Computer Aided Design (CAD)

Content Management System

Customer Relationship Management

Database

Data Mining

E-Commerce

Enterprise Asset Management

Enterprise Resource Planning

Game Development

Google Cloud

Linux

Artificial Intelligence

Machine Learning

Master Data Management

Microsoft

Music Software

Ableton

Network and Security

Open Source

Operating System

Other

Process Management

Oracle

Productivity Software

Programming Languages

Robotics

Supply Chain Management

Testing

Teaching

LearnDesk

Web Development

Ultimate Rust Crash Course

Strings

Transcript

Sign Up

Sign Up

Share