API for STT (audio to text transcription) or some free software?

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline

device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "openai/whisper-large-v3"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    chunk_length_s=30,
    batch_size=16,
    return_timestamps=True,
    torch_dtype=torch_dtype,
    device=device,
)

def speech_to_text(filename, language=None):
    try:
        result = pipe(filename, generate_kwargs={"language": language if language else "english"})
        return result['text']
    except Exception as e:
        return f"Error: {str(e)}"

if __name__ == "__main__":
    import argparse
    
    parser = argparse.ArgumentParser(description="Transcribe audio file using Whisper model")
    parser.add_argument("filename", help="Path to the audio file")
    parser.add_argument("--language", help="Language of the audio (default: english)", default="english")
    
    args = parser.parse_args()
    
    transcription = speech_to_text(args.filename, args.language)
    print("Transcription:", transcription)

this uses nvidia GPU(is available) else cpu

this script will autodownload the model and run it

RexOP

2024-10-12T06:26:01.924Z

woooah

so I just need to have Python installed or also pytorch?

and then I create a file and run it, right?

@ts-ignore

2024-10-12T06:26:49.720Z

yes

you need the cuda version of pytorch

also that transformers package

RexOP

2024-10-12T06:28:47.753Z

I've created a file in VSC (.py) and pasted the text you provided

@@ts-ignore you need the cuda version of pytorch

RexOP

2024-10-12T06:29:12.853Z

where I can download them?

@ts-ignore

2024-10-12T06:29:19.795Z

https://pytorch.org/get-started/locally/

cuda is only available for windows

cuda is nvidia's tech

if you're on mac, you might have to tweak the script to use GPU or whatever your mac has

this probably won't run on NPU

RexOP

2024-10-12T06:30:00.773Z

I'mon windowss

@ts-ignore

2024-10-12T06:30:07.425Z

great

RexOP

2024-10-12T06:30:46.591Z

so the version of CUDA i need to install is related to the GPU i have in my pc?

@ts-ignore

2024-10-12T06:31:04.354Z

yes

but I think if you have latest drivers, install latest cuda version of pytorch

it should work

that's what I did

RexOP

2024-10-12T06:32:59.860Z

oh thankss

I was struggling to find version, now I'm currently on huawei laptop, so I think I've the GPU integrated

maybe it's better to run it on desktop computer

@ts-ignore

2024-10-12T06:34:09.090Z

if you don't have cuda, this script will use cpu

yeah

@@ts-ignore if you don't have cuda, this script will use cpu

RexOP

2024-10-12T06:34:58.675Z

but in this case I need to install CPU or if i install gpu it will do it automatically ?

@ts-ignore

2024-10-12T06:35:21.835Z

yeah cpu version

RexOP

2024-10-12T06:38:00.424Z

I have this one on matebook 14s, so it's cpu right?

and package is pip?

@Rex I have this one on matebook 14s, so it's cpu right?

@ts-ignore

2024-10-12T06:40:26.657Z

its gpu but its integrated one so just cpu version of pytorch

@Rex and package is pip?

@ts-ignore

2024-10-12T06:40:37.999Z

if you used python.org to install python, yes

@@ts-ignore if you used python.org to install python, yes

RexOP

2024-10-12T06:41:17.938Z

thanks, I'll try now :))

and the tranformer package?

@Rex and the tranformer package?

@ts-ignore

2024-10-12T06:44:32.602Z

pip install transformers

@@ts-ignore pip install transformers

RexOP

2024-10-12T06:52:57.077Z

thankss, I've installed everything and needed to enable long paths because I was getting errors, now only warnings...
Now I just need to run the file using py, is there a specific command with pytorch?

and for the language I've just changed the word 'english' to 'italian' in my code

@Rex thankss, I've installed everything and needed to enable long paths because I was getting errors, now only warnings... Now I just need to run the file using py, is there a specific command with pytorch?

@ts-ignore

2024-10-12T06:54:35.678Z

just run the script by

python main.py <path to audio file> --language italian

if you want to tweak the options of it, you should take a look at model card on huggingface url above

@@ts-ignore just run the script by bash python main.py <path to audio file> --language italian

RexOP

2024-10-12T06:58:15.684Z

I got all of this:

config.json: 100%|█████████████████████████████████████████████████████████████████| 1.27k/1.27k [00:00<?, ?B/s]
C:\Users\simon\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\huggingface_hub\file_download.py:147: UserWarning: `huggingface_hub` cache-system uses 
symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\simon\.cache\huggingface\hub\models--openai--whisper-large-v3. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. 
In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
  warnings.warn(message)
Traceback (most recent call last):
  File "D:\Programming\Personal\SbobAI\main.py", line 8, in <module>
    model = AutoModelForSpeechSeq2Seq.from_pretrained(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\simon\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\transformers\models\auto\auto_factory.py", line 564, in from_pretrained        
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\simon\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\transformers\modeling_utils.py", line 3372, in from_pretrained
    raise ImportError(
ImportError: Using `low_cpu_mem_usage=True` or a `device_map` requires Accelerate: `pip install 'accelerate>=0.26.0'`

1 error I think because I wasn't in Administrator mode

@ts-ignore

2024-10-12T06:58:57.357Z

it tells you to install a package

also turn on developer mode

@@ts-ignore also turn on developer mode

RexOP

2024-10-12T06:59:42.171Z

I feel embarassed but I really don't know what it is

@Rex I feel embarassed but I really don't know what it is

@ts-ignore

2024-10-12T07:00:42.607Z

https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development

@@ts-ignore https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development

RexOP

2024-10-12T07:03:04.298Z

thank youuu

it's pretty slow, but I think it's based on my connection

model.safetensors:  15%|███████▊                                            | 461M/3.09G [01:03<05:35, 7.83MB/s]

what are you using it for if I can ask? :))

@ts-ignore

2024-10-12T07:04:31.335Z

I tried to make a language learning app

it was less of learning but more of practicing

RexOP

2024-10-12T07:04:51.791Z

with which stack?

@ts-ignore

2024-10-12T07:05:03.877Z

I used this model to take your voice input > convert to text and feed that text to gemini/gpt to get response/feedback

RexOP

2024-10-12T07:05:27.884Z

that's great! And how's it going?

@Rex with which stack?

@ts-ignore

2024-10-12T07:05:38.707Z

https://www.voicelearn.tech/blog/v1 scroll to very bottom

@Rex that's great! And how's it going?

@ts-ignore

2024-10-12T07:05:52.377Z

its going pretty good

RexOP

2024-10-12T07:08:40.119Z

I love that!

I'm Italian and was in Sweden trying to learn swedish (which is really difficult for our latin language, similar to german) using Duolingo..
I've found the same struggles you mention in your Motivation section

and how do you get to this point if I can ask? (self-taught or school) :))

@ts-ignore

2024-10-12T07:11:03.666Z

I stopped giving more time to duolingo and start giving more time to learning myself by resources online like books, A1 videos etc and practice in my app

@Rex and how do you get to this point if I can ask? (self-taught or school) :))

@ts-ignore

2024-10-12T07:11:07.570Z

self taught

@@ts-ignore I stopped giving more time to duolingo and start giving more time to learning myself by resources online like books, A1 videos etc and practice in my app

RexOP

2024-10-12T07:12:00.438Z

Love this, I'm also passionate in this learning area and 1 year ago tried to make an app to learn words and upgrade your vocabulary

@ts-ignore

2024-10-12T07:12:27.597Z

lets not pollute this chat and continue in #off-topic :)

RexOP

2024-10-12T07:13:20.646Z

yess, I just have one error in the code:

model.safetensors: 100%|███████████████████████████████████████████████████| 3.09G/3.09G [06:42<00:00, 7.67MB/s]
generation_config.json: 100%|██████████████████████████████████████████████| 3.90k/3.90k [00:00<00:00, 7.82MB/s]
preprocessor_config.json: 100%|████████████████████████████████████████████████████████| 340/340 [00:00<?, ?B/s]
tokenizer_config.json: 100%|█████████████████████████████████████████████████| 283k/283k [00:00<00:00, 1.09MB/s]
vocab.json: 100%|██████████████████████████████████████████████████████████| 1.04M/1.04M [00:00<00:00, 1.97MB/s]
tokenizer.json: 100%|██████████████████████████████████████████████████████| 2.48M/2.48M [00:00<00:00, 4.93MB/s]
merges.txt: 100%|████████████████████████████████████████████████████████████| 494k/494k [00:00<00:00, 1.55MB/s]
normalizer.json: 100%|█████████████████████████████████████████████████████| 52.7k/52.7k [00:00<00:00, 45.3MB/s]
added_tokens.json: 100%|███████████████████████████████████████████████████| 34.6k/34.6k [00:00<00:00, 1.04MB/s]
special_tokens_map.json: 100%|█████████████████████████████████████████████████████| 2.07k/2.07k [00:00<?, ?B/s]
Transcription: Error: ffmpeg was not found but is required to load audio files from filename

@ts-ignore

2024-10-12T07:13:40.725Z

https://www.ffmpeg.org/download.html

install ffmpeg, add to path and restart your laptop

https://www.hostinger.in/tutorials/how-to-install-ffmpeg#How_to_Install_FFmpeg_on_Windows

@@ts-ignore https://www.hostinger.in/tutorials/how-to-install-ffmpeg#How_to_Install_FFmpeg_on_Windows

RexOP

2024-10-12T07:47:05.706Z

thanks!

I did it, and now it's running, I'll tell you if it worked after it finished 🙌

just got these warnings:

Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\transformers\models\whisper\generation_whisper.py:496: FutureWarning: The input name `inputs` is deprecated. Please make sure to use `input_features` instead.
  warnings.warn(
You have passed language=italian, but also have set `forced_decoder_ids` to [[1, None], [2, 50360]] which creates a conflict. `forced_decoder_ids` will be ignored in favor of language=italian.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.

unfortunately it stopped without the output of "Transcription:"