How to decrease loading time.....

Unanswered

Dwarf Crocodile posted this in #help-forum

Dwarf CrocodileOP

2024-09-13T20:13:57.825Z

Hey guys, I am new to AWS and stuff.

Right now, i have 10 .csv files of 20-50MB each on S3.

I have a dropdown menu in frontend, I have to select anyone of these csv. And then i am fetching it's data in frontend.

Some .csv files are heavy, it takes almost 10sec to load.

Any suggestions?

134 Replies

B33fb0n3

2024-09-13T21:07:24.470Z

one solution for big data is: using databases. So when you want to work with AWS, use dynamoDB. If you don't want to handle dynamoDB, you can use graphql with aws amplify. Pretty useful if you ask me. Like that your data is there in less than 500ms

Dwarf CrocodileOP

2024-09-13T21:16:37.934Z

Actually, we have to select a protein from dropdown.
And for each protein, there is specific .csv file.
That file contains the data of that protein.

Since proteins are made up of amino acids, which are made up of Nitrogens, carbons, etc..

So csv files contains all those atoms, there respective postion, respective branches, and probabilites of mutation amino acid, and many other stuff.

So its quite large file.

And we are not fetching same attributes everytime from those csv files, thats why we always need to load the complete csv

in future there might be more dropdown options, hence more csv files.
Is storing all these in a database better option than storing them in S3 / cloudfront?

Komondor

2024-09-13T21:21:10.297Z

yes

Since you are having to fetch the entire file, parse it, and only return some attributes from it, you are essentially querying the file. It'd be much faster to query a database

Dwarf CrocodileOP

2024-09-13T21:35:05.984Z

umm this makes sense

which services are cheaper though:
S3+cloudfront
OR
dynamoDB

OR
graphQL + amplify

Also,
these csv files have so much data, is there any way of transferring the content from csv to database? or do I have to do it manually?

Komondor

2024-09-13T21:50:18.546Z

yes sql databases will support csv import for sure, since csvs are already in row format

Dwarf CrocodileOP

2024-09-13T21:55:20.724Z

and which one to use? dynamoDB or something else?

Komondor

2024-09-13T21:55:50.919Z

dynamoDB is a key value store, a noSQL database

Dwarf CrocodileOP

2024-09-13T21:56:58.234Z

according to you...what would be the best option, considering the rates of the services

Komondor

2024-09-13T21:59:26.814Z

sql database for sure

i'm not familiar with amplify, and S3+cloudfront isn't sql

B33fb0n3

2024-09-13T22:12:15.049Z

databases are waaaay more effective. So in almost every case you will be cheaper with a database

Dwarf CrocodileOP

2024-09-13T22:18:52.410Z

I am already using sqlite as database to store user credential details and stuff.
so should i add those csv files data in it somewhere?

Giant Angora

2024-09-13T22:24:18.107Z

You can use MySQL or Postgres

Komondor

2024-09-13T22:29:30.172Z

yes sqlite should work for you

Dwarf CrocodileOP

2024-09-13T22:37:33.185Z

will I face any performance issues if i do it on sqlite? considering that in future, there will be more csv files (a typical csv file is about 25MB)

Komondor

2024-09-13T22:40:12.152Z

how many MB are you fetching in a single request?

Dwarf CrocodileOP

2024-09-13T22:40:48.672Z

it depends on what option user is selecting

there are csv files ranging from 5-50MB

and i am basically querying those csv files, where my required data is, i find it and then show it on frontend

actually, i dont want to give full access to my csv files...

otherwise i would have gave them to download the csv and they could compute things on there on

but that data in csv was generated by our ML models, and we dont want to make it public

Komondor

2024-09-13T22:45:28.039Z

right the file is 5-50MB. You can compare that to a SQL table. But you're not returning the entire file's worth of data in a single request are you?

So you'd query the database table (file), and only retgurn some of it

any sql database can do this with ease

Dwarf CrocodileOP

2024-09-13T22:47:01.176Z

so any sql db will save me time as compared to s3

any idea how fast would that be?

B33fb0n3

2024-09-13T22:49:54.298Z

yes, it will be extremely fast compared to your current method

Serbian Hound

2024-09-13T22:50:04.977Z

@Dwarf Crocodile if you index properly it will be significantly faster

Dwarf CrocodileOP

2024-09-13T22:50:28.144Z

for csv files..there are like 10k rows

Serbian Hound

2024-09-13T22:50:28.930Z

probably dont even need to index specific fields, it'll just be much faster

yes, db is designed exactly for that

Dwarf CrocodileOP

2024-09-13T22:50:42.673Z

any tutorials on how to do it?

Serbian Hound

2024-09-13T22:50:43.551Z

flatfile csv isnt made for querying

you have no sql/mysql knowledge?

Dwarf CrocodileOP

2024-09-13T22:51:19.725Z

not enough

i just know how to make schema and store stuff

Serbian Hound

2024-09-13T22:51:47.180Z

thats all you need

and a query

just make it basic, it will be fast enough im sure

Dwarf CrocodileOP

2024-09-13T22:52:02.021Z

idk anyting about indexing

Serbian Hound

2024-09-13T22:52:15.133Z

well even if you have an incremental id as your primary key index

and thats it

it'll still be way faster

Dwarf CrocodileOP

2024-09-13T22:53:54.159Z

and what is redis?

for caching n stuff?

Serbian Hound

2024-09-13T22:55:36.766Z

thats a usecase for redis yeah , think of redis as an in memory solution

so your sessions and stuff like that you can use redis

your data sounds relational, so i would definitely suggest sql db solution

Dwarf CrocodileOP

2024-09-13T22:56:43.866Z

no need for caching?

Serbian Hound

2024-09-13T22:56:54.449Z

no not unless you see a need

and if you do, you dont need redis just to cache

you can cache it on your server

for example nextJS caches fetch endpoints anyway iirc

Dwarf CrocodileOP

2024-09-13T22:57:54.504Z

i am using react n flask

n sqlite db

Serbian Hound

2024-09-13T22:58:26.360Z

you should convert that csv to sql

and store it in your db

query it and see how much faster iti s

Dwarf CrocodileOP

2024-09-13T22:58:41.544Z

hmm

will try

Serbian Hound

2024-09-13T22:59:12.990Z

not the same thing but i have an app that generates PDFs on the fly

previously i was generating them each time, because i thought they're small

but now i store them in db

generating and serving took like a second, db lookup is instant

like a few ms

Dwarf CrocodileOP

2024-09-13T23:00:01.996Z

ohhh

thats reallly good

Serbian Hound

2024-09-13T23:00:22.522Z

yeah trust me when u have large data

database is the best

Dwarf CrocodileOP

2024-09-13T23:00:38.008Z

okk

and i just need to make a separate table for each csv file

Serbian Hound

2024-09-13T23:01:31.997Z

yeah and think of the relationships between each table

doesnt have to be anything special

Dwarf CrocodileOP

2024-09-13T23:01:45.541Z

umm

thnx 🫡

Serbian Hound

2024-09-13T23:02:28.000Z

gl noob let me know how it goes

Dwarf CrocodileOP

2024-09-13T23:02:42.791Z

i will ask you guys doubts, if I get any while working on this

Serbian Hound

2024-09-13T23:02:49.882Z

👍

Dwarf CrocodileOP

2024-09-13T23:03:17.614Z

what u do btw?

Serbian Hound

2024-09-13T23:03:49.939Z

wym

for work?

im a frontend dev

but i like backend also lol

Dwarf CrocodileOP

2024-09-13T23:04:40.590Z

good

hey, 1 more doubt.

2-3 of my csv files have about 20k rows

ain't that wayy too much?

Serbian Hound

2024-09-13T23:05:21.704Z

no wonder its slow af lol

you have to remember

Dwarf CrocodileOP

2024-09-13T23:05:26.904Z

hmm

Serbian Hound

2024-09-13T23:05:30.070Z

this is exactly what db is designed for

20k is not a lot by db standards

Dwarf CrocodileOP

2024-09-13T23:05:40.343Z

ohh

Serbian Hound

2024-09-13T23:05:42.982Z

but for csv it definitely is

Dwarf CrocodileOP

2024-09-13T23:06:16.327Z

so querying a 100 row db table take similar time as querying a 10k row db table?

i meant, querying 10k will still take under 1 sec?

Serbian Hound

2024-09-13T23:06:42.950Z

depends if you're querying an index, in which case yes, but in most cases you're not doing that

but it wont be slow

yeah it should be very very fast

Dwarf CrocodileOP

2024-09-13T23:06:51.428Z

cauz now, it takes about 7-10sec

hmm

cool

Serbian Hound

2024-09-13T23:07:02.007Z

everytime you're querying it?

thats insane

sql will be much faster

Dwarf CrocodileOP

2024-09-13T23:07:24.417Z

Got it

Serbian Hound

2024-09-13T23:07:27.400Z

https://stackoverflow.com/questions/3779088/database-that-can-handle-500-millions-rows

Dwarf CrocodileOP

2024-09-13T23:07:29.451Z

should def. use it then

Serbian Hound

2024-09-13T23:07:46.248Z

you have someone here handling 500 million on a pretty old pc

Dwarf CrocodileOP

2024-09-13T23:08:06.755Z

waow

like most scalable leaderboard systum uses redis...to show ranking in realtime

idk for what reason they use redis

Serbian Hound

2024-09-13T23:09:16.048Z

redis is useful because its in memory

Dwarf CrocodileOP

2024-09-13T23:09:17.608Z

but yeah...they also handle pretty large data

i dont get it... "in memory"

Serbian Hound

2024-09-13T23:09:33.946Z

a leaderboard constantly changes and needs computation, your data is .csv files so it wont even change

in memory = RAM

if something is a file its not being stored on your RAM its on your hard drive

you get it?

Dwarf CrocodileOP

2024-09-13T23:10:00.615Z

yes yes

ohh

Serbian Hound

2024-09-13T23:10:02.664Z

like when you are on google chrome, the tabs you're on, they're "in memory"

Dwarf CrocodileOP

2024-09-13T23:10:31.672Z

hmm....chrome tabs take up a lot of my RAM 😦

hmmm

Serbian Hound

2024-09-13T23:10:37.212Z

haha lol same