[PyTorch 시작하기] Dataset과 DataLoader, 변형(Transform) (feat. MNIST 데이터셋)

PROGRAMMING/AI

by koharin 2023. 8. 6. 16:42

728x90

import torch
from torch.utils.data import Dataset
from torchvision import datasets
from torchvision.transforms import ToTensor
import matplotlib.pyplot as plt

장치 얻기

# Check Device
if torch.cuda.is_available():
  DEVICE = torch.device('cuda') # use GPU
else:
  DEVICE = torch.device('cpu') # use CPU

print('Using PyTorch Version: ', torch.__version__, ', Device: ', DEVICE)

GPU 또는 MPS 같은 하드웨어 가속기에서 모델 학습
torch.cuda 사용 가능한지 확인하고, 그렇지 않으면 CPU 사용

MNIST 데이터셋 다운로드

# Download MNIST Dataset

training_data = datasets.MNIST(
    root="Dataset/MNIST",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.MNIST(
    root="Dataset/MNIST",
    train=False,
    download=True,
    transform=ToTensor()
)

download: 인터넷 상에서 데이터 다운로드
transform: 이미지 불러오면서 전처리 동시에 진행
- ToTensor(): tensor형태로 변경
- 0 ~ 255 -> 0 ~ 1(정규화)
- Input에 이용

MNIST 학습 데이터 시각화

# Visualize MNIST Dataset 

labels_map = {
    0: "0",
    1: "1",
    2: "2",
    3: "3",
    4: "4",
    5: "5",
    6: "6",
    7: "7",
    8: "8",
    9: "9",
}

figure = plt.figure(figsize=(8,8))
cols, rows = 3, 3
for i in range(1, cols*rows + 1):
  sample_idx = torch.randint(len(training_data), size=(1,)).item()
  img, label = training_data[sample_idx]
  figure.add_subplot(rows, cols, i)
  plt.title(labels_map[label])
  plt.axis("off")
  plt.imshow(img.squeeze(), cmap="gray")
plt.show()

DataLoader

# split dataset per mini_batch

BATCH_SIZE = 64

train_dataloader = DataLoader(training_data, batch_size=BATCH_SIZE, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=BATCH_SIZE, shuffle=True)

BATCH_SIZE = 64 : MiniBatch 1개당 데이터 32개 구성
Iteration : MiniBatch를 이용해 학습하는 횟수

데이터 확인

# check data

for (train_features, train_labels) in train_dataloader:
  print('x_train: ', train_features.size(), ', type: ', train_features.type())
  print('y_train: ', train_labels.size(), ', type: ', train_labels.type())
  img = train_features[0].squeeze()
  label = train_labels[0]
  plt.imshow(img, cmap='gray')
  plt.show()
  print(f"Label: {label}")

iteration : 특징(feature)과 정답(label)의 묶음(batch) 반환
x_train == 학습용 데이터의 특징(feature)
- 이미지 데이터 수: 64개(per minibatch)
- 가로: 28, 세로: 28,
- 1: Gray scale(흑백)
- Type: torch.FloatTensor
y_train (라벨) == 학습용 데이터의 정답(label)
- 이미지 데이터 수: 64개(per minibatch)
- label 값 이미지 당 1개씩 존재하므로 총 64개
- Type: torch.LogTensor

728x90

저작자표시 비영리 변경금지

'PROGRAMMING > AI' 카테고리의 다른 글

[PyTorch] CIFAR-10 데이터셋 써보기 (0)	2023.08.07
Diffusion Model 정리 (Generative model) (0)	2023.07.28
[PyTorch 시작하기] 1. 텐서(TENSOR) (0)	2023.07.10

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

anonymous?

고정 헤더 영역

메뉴 레이어

메뉴 리스트

검색 레이어

검색 영역

상세 컨텐츠

본문 제목

본문

장치 얻기

MNIST 데이터셋 다운로드

MNIST 학습 데이터 시각화

DataLoader

데이터 확인

'PROGRAMMING > AI' 카테고리의 다른 글

관련글 더보기

추가 정보

인기글

최신글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역