获得网站后台地址江门云建站模板

当前位置: 首页 > news >正文

获得网站后台地址,江门云建站模板,wordpress+教材主题,食品厂招男女工5000文章目录 下载数据修改默认保存地址 TRANSFORMERS_CACHE保存到本地 本地加载保存加载 读取 .arrow 数据 下载数据 1、Python 代码下载 from datasets import load_dataset imdb load_dataset(imdb)

name参数为full或mini#xff0c;full表示下载全部数… 文章目录 下载数据修改默认保存地址 TRANSFORMERS_CACHE保存到本地 本地加载保存加载 读取 .arrow 数据 下载数据

1、Python 代码下载 from datasets import load_dataset imdb load_dataset(imdb)

name参数为full或minifull表示下载全部数据mini表示下载部分少量数据

dataset load_dataset(model_name, namefull) imdbDatasetDict({train: Dataset({features: [text, label],num_rows: 25000})test: Dataset({features: [text, label],num_rows: 25000})unsupervised: Dataset({features: [text, label],num_rows: 50000})

})默认保存在 ~/.cache/huggingface 文件夹 数据格式如下 \( cd datasets/imdb/ \) tree . └── plain_text└── 0.0.0├── e6281661ce1c48d982bc483cf8a173c1bbeb5d31│ ├── dataset_info.json│ ├── imdb-test.arrow│ ├── imdb-train.arrow│ └── imdb-unsupervised.arrow├── e6281661ce1c48d982bc483cf8a173c1bbeb5d31.incomplete_info.lock└── e6281661ce1c48d982bc483cf8a173c1bbeb5d31_builder.lock3 directories, 6 files2、huggingface-cli 命令下载 这样下载也会保存到 ~/.cache/huggingface 文件夹 huggingface-cli download –repo-type dataset imdb3、git 修改默认保存地址 TRANSFORMERS_CACHE 环境变量添加 export TRANSFORMERS_CACHEpath/代码中使用 import os os.environ[TRANSFORMERS_CACHE]保存到本地 本地加载 保存 save_path /Users/xx/Downloads/imdb imdb.save_to_disk(save_path)Saving the dataset (11 shards): 100%|█| 2500025000 [00:0000:00, 97903.42 exam Saving the dataset (11 shards): 100%|█| 2500025000 [00:0000:00, 251032.07 exa Saving the dataset (11 shards): 100%|█| 5000050000 [00:0000:00, 88591.53 exam imdb2 load_from_disk(save_path) imdb2DatasetDict({train: Dataset({features: [text, label],num_rows: 25000})test: Dataset({features: [text, label],num_rows: 25000})unsupervised: Dataset({features: [text, label],num_rows: 50000}) })存储格式如下 \( cd imdb/ \) tree . ├── dataset_dict.json ├── test │ ├── data-00000-of-00001.arrow │ ├── dataset_info.json │ └── state.json ├── train │ ├── data-00000-of-00001.arrow │ ├── dataset_info.json │ └── state.json └── unsupervised├── data-00000-of-00001.arrow├── dataset_info.json└── state.json3 directories, 10 files加载

指定加载测试集

save_path1 /Users/xx/Downloads/imdb/test imdb3 load_from_disk(save_path1) imdb3Dataset({features: [text, label],num_rows: 25000 }) imdb4 load_dataset(imdb) # 默认加载 .cache 中的数据 imdb4 load_dataset(path/Users/xx/Downloads/imdb)Generating train split: 1 examples [00:00, 69.32 examples/s] Generating test split: 1 examples [00:00, 277.31 examples/s]imdb4DatasetDict({train: Dataset({features: [_data_files, _fingerprint, _format_columns, _format_kwargs, _format_type, _output_all_columns, _split],num_rows: 1})test: Dataset({features: [_data_files, _fingerprint, _format_columns, _format_kwargs, _format_type, _output_all_columns, _split],num_rows: 1}) })

指定加载文件 - 失败

save_path2 /Users/xx/Downloads/imdb/test/data-00000-of-00001.arrow imdb4 load_from_disk(save_path2)Traceback (most recent call last):File stdin, line 1, in moduleFile /Users/xx/miniconda3/lib/python3.11/site-packages/datasets/load.py, line 2215, in load_from_diskraise FileNotFoundError( FileNotFoundError: Directory /Users/xx/Downloads/imdb/test/data-00000-of-00001.arrow is neither a Dataset directory nor a DatasetDict directory.无法从 .cache/huggingface/datasets 加载 path /Users/xx/.cache/huggingface/datasets/imdb from datasets import load_from_diskimdb2 load_from_disk(path)Traceback (most recent call last):File stdin, line 1, in moduleFile /Users/xx/miniconda3/lib/python3.11/site-packages/datasets/load.py, line 2215, in load_from_diskraise FileNotFoundError( FileNotFoundError: Directory /Users/xx/.cache/huggingface/datasets/imdb is neither a Dataset directory nor a DatasetDict directory. path1 /Users/xx/.cache/huggingface/datasets/imdb/plain_text/0.0.0/e6281661ce1c48d982bc483cf8a173c1bbeb5d31/imdb-test.arrow imdb2 load_from_disk(path1)Traceback (most recent call last):File stdin, line 1, in moduleFile /Users/xx/miniconda3/lib/python3.11/site-packages/datasets/load.py, line 2215, in load_from_diskraise FileNotFoundError( FileNotFoundError: Directory /Users/xx/.cache/huggingface/datasets/imdb/plain_text/0.0.0/e6281661ce1c48d982bc483cf8a173c1bbeb5d31/imdb-test.arrow is neither a Dataset directory nor a DatasetDict directory. path1 /Users/xx/.cache/huggingface/datasets/imdb/plain_text/0.0.0/e6281661ce1c48d982bc483cf8a173c1bbeb5d31/ imdb2 load_from_disk(path1)Traceback (most recent call last):File stdin, line 1, in moduleFile /Users/xx/miniconda3/lib/python3.11/site-packages/datasets/load.py, line 2215, in load_from_diskraise FileNotFoundError( FileNotFoundError: Directory /Users/xx/.cache/huggingface/datasets/imdb/plain_text/0.0.0/e6281661ce1c48d982bc483cf8a173c1bbeb5d31/ is neither a Dataset directory nor a DatasetDict directory. path1 /Users/xx/.cache/huggingface/datasets/imdb/plain_text/0.0.0/ imdb2 load_from_disk(path1) Traceback (most recent call last):File stdin, line 1, in moduleFile /Users/xx/miniconda3/lib/python3.11/site-packages/datasets/load.py, line 2215, in load_from_diskraise FileNotFoundError( FileNotFoundError: Directory /Users/xx/.cache/huggingface/datasets/imdb/plain_text/0.0.0/ is neither a Dataset directory nor a DatasetDict directory.path1 /Users/xx/.cache/huggingface/datasets/imdb/plain_text/ imdb2 load_from_disk(path1)Traceback (most recent call last):File stdin, line 1, in moduleFile /Users/xx/miniconda3/lib/python3.11/site-packages/datasets/load.py, line 2215, in load_from_diskraise FileNotFoundError( FileNotFoundError: Directory /Users/xx/.cache/huggingface/datasets/imdb/plain_text/ is neither a Dataset directory nor a DatasetDict directory.读取 .arrow 数据 双击 .arrow 文件无法直接查看使用下面代码可以查看内容 def read_arrow_to_df_julia_ok(path):with open(path, rb) as f:r pyarrow.ipc.RecordBatchStreamReader(f)df r.read_pandas()return dfpath /Users/xx/Downloads/imdb/test/data-00000-of-00001.arrow path /Users/xx/.cache/huggingface/datasets/imdb/plain_text/0.0.0/e6281661ce1c48d982bc483cf8a173c1bbeb5d31/imdb-test.arrow table read_arrow_to_df_julia_ok(path)

打印数据

print(打印数据\n, table) 结果 打印数据text label 0 I love sci-fi and am willing to put up with a … 0 1 Worth the entertainment value of a rental, esp… 0 2 its a totally average film with a few semi-alr… 0 3 STAR RATING: ***** Saturday Night **** Friday … 0 4 First off let me say, If you havent enjoyed a… 0 … … … 24995 Just got around to seeing Monster Man yesterda… 1 24996 I got this as part of a competition prize. I w… 1 24997 I got Monster Man in a box set of three films … 1 24998 Five minutes in, i started to feel how naff th… 1 24999 I caught this movie on the Sci-Fi channel rece… 1